-->

Fastest way to count consecutive 1 bits. C++

2020-07-27 04:01发布

站内文章 / C++

142 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Basicly I just need to know the position of the highest 1 bit inside of an int or unsigned int. Like:

00001111=4;
00011111=5;
11111111=8;

Because I am sure that any number I will get will have consecutive 1 bits. 0...0000011...1 There will be no ..00010011... or something. So method can find the highest 1 or just count 1s. No matter.

This is the best I managed to do:

Uint32 number;
int shift=16; int segment=8;
while (segment) 
{
if (number>>shift!=0) shift+=segment; 
else shift-=segment;
segment>>1; // /2
}

回答1:

Copy/paste of my function for it:

size_t FirstSetBit(unsigned int v) const
{
#if defined(_MSC_VER)
    unsigned long ul;
    // Just 10% faster than MultiplyDeBruijnBitPosition method, on Core i7
    _BitScanForward(&ul, v);
    return ul;
#elif defined(__GNUC__) || defined(__clang__)
    return 31 - __builtin_clz(v);
#else // integer fallback for non-x64
    #warning You may be able to optimise this code for your compiler/processor

    int r;
    static const int MultiplyDeBruijnBitPosition[32] =
    {
        0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
        31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
    };

    r = MultiplyDeBruijnBitPosition[(uint32_t((v & -int(v)) * 0x077CB531U)) >> 27];
return r;
#endif
}

回答2:

Assuming you know the size of the int in question (e.g., 32 bits) you can pretty easily use a binary search for the highest bit that's set:

int bit_pos(unsigned value) { 
    static const std::vector<unsigned> masks = {
        0xffff0000, 
        0xff00ff00, 
        0xf0f0f0f0, 
        0xcccccccc, 
        0xaaaaaaaa
    };

    if (!value)
        return 0;

    int position = 0;

    int val = 16;

    for (unsigned mask : masks) {
        if (value & mask) {
            position += val;
            value &= mask;
        }
        val /= 2;
    }

    return position + 1;
}

For (probably) a little extra speed at the expense of even greater obscurity, you can do a little bit extra fiddling up front to get only one bit set, then find its position:

unsigned bit_pos2(unsigned int value) {
    unsigned int position = 32;
    value = ~value;
    value &= -signed(value);
    if (value) --position;
    if (value & 0x0000ffff) position -= 16;
    if (value & 0x00ff00ff) position -= 8;
    if (value & 0x0f0f0f0f) position -= 4;
    if (value & 0x33333333) position -= 2;
    if (value & 0x55555555) position -= 1;
    return position;
}

For 64-bit integers, the numbers get larger, but we need to add only one more iteration:

unsigned bit_pos64(unsigned long long value) {
    value = ~value;
    unsigned position = 64;
    value &= -(long long)value;
    if (value) --position;
    if (value & 0x00000000ffffffff) position -= 32;
    if (value & 0x0000ffff0000ffff) position -= 16;
    if (value & 0x00ff00ff00ff00ff) position -= 8;
    if (value & 0x0f0f0f0f0f0f0f0f) position -= 4;
    if (value & 0x3333333333333333) position -= 2;
    if (value & 0x5555555555555555) position -= 1;
    return position;
}

By having only one bit set, we avoid dependencies between the loop iterations, so the iterations can be executed in parallel. Manually unrolling the loop (as above) may help the chances of that happening, at least slightly. This also requires only one operation per iteration instead of 2, so it may be faster even without any parallel execution.

回答3:

Others have given a variety of answers, but it's probably worth mentioning that there is an entire book full of these kinds of things — Hacker's Delight by Henry S. Warren Jr., ISBN 978-0-201-91465-8, which also has a web page.

It's also worth highlighting the fact that some microprocessors have special support for some of these kinds of things. Lasse Reinhold's answer, above, makes use of that fact, but doesn't call the reader's attention to it. Generally speaking, except for very simple cases (like bit rotate instructions), compilers can't optimise “bit twiddling” algorithms to a single machine instruction, so if you know you’re on a machine that has e.g. a bit-scan-forward or population count instruction or similar, and you're in a position to use it (via either compiler intrinsics or asm statements), you might want to do so.

Finally, since the question started by saying that the number was known to be of the form 0...000111...1, I'll add another option, based on computing (in parallel) the sums of the groups of bits:

uint32_t count_set_bits(uint32_t x) {
  x = ((x >> 1) & 0x55555555) + (x & 0x55555555);
  x = ((x >> 2) & 0x33333333) + (x & 0x33333333);
  x = ((x >> 4) & 0x0f0f0f0f) + (x & 0x0f0f0f0f);
  x = ((x >> 8) + x) & 0x00ff00ff);
  return (x >> 16) + x) & 0x0000ffff;
}

回答4:

What you doing is called a bit scan. In your case it's called a bit scan reverse. It's also the floor of the log base two algorithm.

The x86 instruction set has a instruction for this, bsr = bit scan reverse, since the Intel 386. So you should try use a function which calls the instruction when you can. For MSVC you want to use _BitScanReverse, with GCC 31 -__builtin_clz(x), and with ICC _bit_scan_reverse.

I looked at the assembly output of these intrinsics/builtins from MSVC and GCC and they both generated the bsr instruction.

The Intel Haswell processor added the lzcnt instruction (and AMD added it much early in Barcelona). This counts the leading zeros. It is the same as 31 - bsr (or equal to bsr - see the warning below). You can call it with __lzcnt with MSVC. However, you should be warned that if you do this on a processor which does not support lzcnt it will not crash and instead will use the bsr instruction

The encoding of lzcnt is similar enough to bsr that if lzcnt is performed on a CPU not supporting it such as Intel CPU's prior to Haswell, it will perform the bsr operation instead of raising an invalid instruction error.

If you want to do BitScanReverse in software there are several different ways. See the section "Find the log base 2 of an N-bit integer" http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn and http://chessprogramming.wikispaces.com/BitScan

回答5:

int getHighestOne(unsigned int num) {
    int count = 0;
    while(num >>= 1) ++count;
    return count;
}

Returns the position of the highest one starting with 0, or -1 if there is no one.

getHighestOne(0) would return -1
getHighestOne(1) would return 0
getHighestOne(10) would return 3

Edit: Here is a link to some fast log methods.

回答6:

(1)

you could count how many times you would have to bitshift your unsigned int until it was zero?

see What are bitwise shift (bit-shift) operators and how do they work?

(2)

example

number: 0111

bitshift one to the right: 0011, use bitwise x-or with original number 0111 ^ 0011 = 0100

in cpp:

unsigned int num = 3;

unsigned int answer = ((num >> 1) ^ (num)); 

cout << answer << '\n';

回答7:

Below code might help to count consecutive 1 bits.

int count_consecutive_ones(int in) {
    int count = 0;
    while (in) {
        in = (in & (in << 1));
        count++;
    }
    return count;
}

回答8:

int count_bit(unsigned int num) {
    // Assume 0<= num < 256, i.e. 8 bit
    // It can be easily support larger word size by successive calling.
    if ( num >= 256 ) return -1;
    return count[ num ];
}

Yes, the fastest way is lookup table.

回答9:

you can try something like

    std::bitset<32> b1; 
    int i_num;
    std::cin>> i_num;
    b1=i_num;
   if(b1[31]==1)
      return 31;
   else if(b1[30]==1)
      return 30;
     ---------
  else if(b1[0]==1)
 return 0;
 else
 reutrn -1;