How to calculate 2^n-1 efficiently without overflow?_问答_开发者

I want to calculate 2ⁿ-1 for a 64bit integer value. What I currently do is this

for(i=0; i<n; i++) r|=1<<i;

and I wonder if there is more elegant way to do it. The line is in an inner loop, so I need it to be fast.

I thought of

  r=(1ULL<<n)-1;

but it doesn't work for n=64, because << is only defined for values of n up to 63.

EDIT: Thanks for all your answers and comments. Here is a little table with the solutions that I tried and liked best. Second column is time in seconds of my (completely unscientific) benchmark.

    
r=N2MINUSONE_LUT[n];            3.9 lookup table = fastest, answer by aviraldg
r =n?~0ull>>(64 - n):0ull;      5.9 fastest without LUT, comment by Christoph
r=(1ULL<<n)-1;                  5.9 Obvious but WRONG!   
r =(n==64)?-1:(1ULL<<n)-1;      7.0 Short, clear and quite fast, answer by Gabe
r=((1ULL<<(n/2))<<((n+1)/2))-1; 8.2 Nice, w/o spec. case, answer by drawnonward
r=(1ULL<<n-1)+((1ULL<<n-1)-1);  9.2 Nice, w/o spec. case, answer by David Lively
r=pow(2, n)-1;               99.0 Just for comparison
for(i=0; i<n; i++) r|=1开发者_JAVA百科<<i;   123.7 My original solution = lame

I accepted

r =n?~0ull>>(64 - n):0ull;

as answer because it's in my opinion the most elegant solution. It was Christoph who came up with it at first, but unfortunately he only posted it in a comment. Jens Gustedt added a really nice rationale, so I accept his answer instead. Because I liked Aviral Dasgupta's lookup table solution it got 50 reputation points via a bounty.

Use a lookup table. (Generated by your present code.) This is ideal, since the number of values is small, and you know the results already.

/* lookup table: n -> 2^n-1 -- do not touch */
const static uint64_t N2MINUSONE_LUT[] = {
0x0,
0x1,
0x3,
0x7,
0xf,
0x1f,
0x3f,
0x7f,
0xff,
0x1ff,
0x3ff,
0x7ff,
0xfff,
0x1fff,
0x3fff,
0x7fff,
0xffff,
0x1ffff,
0x3ffff,
0x7ffff,
0xfffff,
0x1fffff,
0x3fffff,
0x7fffff,
0xffffff,
0x1ffffff,
0x3ffffff,
0x7ffffff,
0xfffffff,
0x1fffffff,
0x3fffffff,
0x7fffffff,
0xffffffff,
0x1ffffffff,
0x3ffffffff,
0x7ffffffff,
0xfffffffff,
0x1fffffffff,
0x3fffffffff,
0x7fffffffff,
0xffffffffff,
0x1ffffffffff,
0x3ffffffffff,
0x7ffffffffff,
0xfffffffffff,
0x1fffffffffff,
0x3fffffffffff,
0x7fffffffffff,
0xffffffffffff,
0x1ffffffffffff,
0x3ffffffffffff,
0x7ffffffffffff,
0xfffffffffffff,
0x1fffffffffffff,
0x3fffffffffffff,
0x7fffffffffffff,
0xffffffffffffff,
0x1ffffffffffffff,
0x3ffffffffffffff,
0x7ffffffffffffff,
0xfffffffffffffff,
0x1fffffffffffffff,
0x3fffffffffffffff,
0x7fffffffffffffff,
0xffffffffffffffff,
};

How about a simple r = (n == 64) ? -1 : (1ULL<<n)-1;?

If you want to get the max value just before overflow with a given number of bits, try

r=(1ULL << n-1)+((1ULL<<n-1)-1);

By splitting the shift into two parts (in this case, two 63 bit shifts, since 2^64=2*2^63), subtracting 1 and then adding the two results together, you should be able to do the calculation without overflowing the 64 bit data type.

if (n > 64 || n < 0)
  return undefined...

if (n == 64)
  return 0xFFFFFFFFFFFFFFFFULL;

return (1ULL << n) - 1;

I like aviraldg answer best. Just to get rid of the `ULL' stuff etc in C99 I would do

static inline uint64_t n2minusone(unsigned n) {
   return n ? (~(uint64_t)0) >> (64u - n) : 0;
}

To see that this is valid

an uint64_t is guaranteed to have a width of exactly 64 bit
the bit negation of that `zero of type uint64_t' has thus exactly 64 one bits
right shift of an unsigned value is guaranteed to be a logical shift, so everything is filled with zeros from the left
shift with a value equal or greater to the width is undefined, so yes you have to do at least one conditional to be sure of your result
an inline function (or alternatively a cast to uint64_t if you prefer) makes this type safe; an unsigned long long may well be an 128 bit wide value in the future
a static inline function should be seamlessly inlined in the caller without any overhead

The only problem is that your expression isn't defined for n=64? Then special-case that one value.

(n == 64 ? 0ULL : (1ULL << n)) - 1ULL

Shifting 1 << 64 in a 64 bit integer yields 0, so no need to compute anything for n > 63; shifting should be enough fast

r = n < 64 ? (1ULL << n) - 1 : 0;

But if you are trying this way to know the max value a N bit unsigned integer can have, you change 0 into the known value treating n == 64 as a special case (and you are not able to give a result for n > 64 on hardware with 64bit integer unless you use a multiprecision/bignumber library).

Another approach with bit tricks

~-(1ULL << (n-1) ) | (1ULL << (n-1))

check if it can be semplified... of course, n>0

EDIT

Tests I've done

__attribute__((regparm(0))) unsigned int calcn(int n)
{
  register unsigned int res;
  asm(
    "  cmpl $32, %%eax\n"
    "  jg   mmno\n"
    "  movl $1, %%ebx\n"      // ebx = 1
    "  subl $1, %%eax\n"      // eax = n - 1
    "  movb %%al, %%cl\n"     // because of only possible shll reg mode
    "  shll %%cl, %%ebx\n"    // ebx = ebx << eax
    "  movl %%ebx, %%eax\n"   // eax = ebx
    "  negl %%ebx\n"          // -ebx
    "  notl %%ebx\n"          // ~-ebx
    "  orl  %%ebx, %%eax\n"   // ~-ebx | ebx
    "  jmp  mmyes\n"
    "mmno:\n"
    "  xor %%eax, %%eax\n"
    "mmyes:\n"
    :
    "=eax" (res):
    "eax" (n):
    "ebx", "ecx", "cc"
    );
  return res;
}

#define BMASK(X) (~-(1ULL << ((X)-1) ) | (1ULL << ((X)-1)))
int main()
{
  int n = 32; //...
  printf("%08X\n", BMASK(n));
  printf("%08X %d %08X\n", calcn(n), n&31, BMASK(n&31));
  return 0;
}

Output with n = 32 is -1 and -1, while n = 52 yields "-1" and 0xFFFFF, casually 52&31 = 20 and of course n = 20 gives 0xFFFFF...

EDIT2 now the asm code produces 0 for n > 32 (since I am on a 32 bit machine), but at this point the a ? b : 0 solution with the BMASK is clearer and I doubt the asm solution is too much faster (if speed is a so big concern the table idea could be the faster).

Since you've asked for an elegant way to do it:

const uint64_t MAX_UINT64 = 0xffffffffffffffffULL;
#define N2MINUSONE(n) ((MAX_UINT64>>(64-(n))))

I hate it that (a) n << 64 is undefined and (b) on the popular Intel hardware shifting by word size is a no-op.

You have three ways to go here:

Lookup table. I recommend against this because of the memory traffic, plus you will write a lot of code to maintain the memory traffic.
Conditional branch. Check if n is equal to the word size (8 * sizeof(unsigned long long)), if so, return ~(unsigned long long)0, otherwise shift and subtract as usual.
Try to get clever with arithmetic. For example, in real numbers 2^n = 2^(n-1) + 2^(n-1), and you can exploit this identity to make sure you never use a power equal to the word size. But you had better be very sure that n is never zero, because if it is, this identity cannot be expressed in the integers, and shifting left by -1 is likely to bite you in the ass.

I personally would go with the conditional branch—it is the hardest to screw up, manifestly handles all reasonable cases of n, and with modern hardware the likelihood of a branch misprediction is small. Here's what I do in my real code:

/* What makes things hellish is that C does not define the effects of
   a 64-bit shift on a 64-bit value, and the Intel hardware computes
   shifts mod 64, so that a 64-bit shift has the same effect as a
   0-bit shift.  The obvious workaround is to define new shift functions 
   that can shift by 64 bits. */

static inline uint64_t shl(uint64_t word, unsigned bits) {
  assert(bits <= 64);
  if (bits == 64)
    return 0;
  else
    return word << bits;
}

I think the issue you are seeing is caused because (1<<n)-1 is evaluated as (1<<(n%64))-1 on some chips. Especially if n is or can be optimized as a constant.

Given that, there are many minor variations you can do. For example:

((1ULL<<(n/2))<<((n+1)/2))-1;

You will have to measure to see if that is faster then special casing 64:

(n<64)?(1ULL<<n)-1:~0ULL;

It is true that in C each bit-shifting operation has to shift by less bits than there are bits in the operand (otherwise, the behavior is undefined). However, nobody prohibits you from doing the shift in two consecutive steps

r = ((1ULL << (n - 1)) << 1) - 1;

I.e. shift by n - 1 bits first and then make an extra 1 bit shift. In this case, of course, you have to handle n == 0 situation in a special way, if that is a valid input in your case.

In any case, it is better than your for cycle. The latter is basically the same idea but taken to the extreme for some reason.

Ub = universe in bits = lg(U):
high(v) = v >> (Ub / 2)
low(v) = v & ((~0) >> (Ub - Ub / 2))  // Deal with overflow and with Ub even or odd

You can exploit integer division inaccuracy and use the modulo of the exponent to ensure you always shift in the range [0, (sizeof(uintmax_t) * CHAR_BIT) - 1] to create a universal pow2i function for integers of the largest supported native word size, however, this can easily be tweaked to support arbitrary word sizes.

I honestly don't get why this isn't just the implementation in hardware for bit shift overflows.

#include <limits.h>
static inline uintmax_t pow2i(uintmax_t exponent) {
    #define WORD_BITS  ( sizeof(uintmax_t) * CHAR_BIT )
    return ((uintmax_t) 1) << (exponent / WORD_BITS) << (exponent % WORD_BITS);
    #undef WORD_BITS
}

From there, you can calculate pow2i(n) - 1.