开发者

use of the bitwise operators to pack multiple values in one int

开发者 https://www.devze.com 2023-03-17 07:29 出处:网络
Low level bit manipulation has never been my strong point. I will appreciate some help in understanding the following use case of bitwise operators.Consider...

Low level bit manipulation has never been my strong point. I will appreciate some help in understanding the following use case of bitwise operators.Consider...

开发者_运维问答int age, gender, height, packed_info;

. . .   // Assign values 

// Pack as AAAAAAA G HHHHHHH using shifts and "or"
packed_info = (age << 8) | (gender << 7) | height;

// Unpack with shifts and masking using "and"
height = packed_info & 0x7F;   // This constant is binary ...01111111
gender = (packed_info >> 7) & 1;
age    = (packed_info >> 8);

I am not sure what this code is accomplishing and how? Why use the magic number 0x7F ? How is the packing and unpacking accomplished?

Source


As the comment says, we're going to pack the age, gender and height into 15 bits, of the format:

AAAAAAAGHHHHHHH

Let's start with this part:

(age << 8)

To start with, age has this format:

age           = 00000000AAAAAAA

where each A can be 0 or 1.

<< 8 moves the bits 8 places to the left, and fills in the gaps with zeroes. So you get:

(age << 8)    = AAAAAAA00000000

Similarly:

gender        = 00000000000000G
(gender << 7) = 0000000G0000000
height        = 00000000HHHHHHH

Now we want to combine these into one variable. The | operator works by looking at each bit, and returning 1 if the bit is 1 in either of the inputs. So:

0011 | 0101 = 0111

If a bit is 0 in one input, then you get the bit from the other input. Looking at (age << 8), (gender << 7) and height, you'll see that, if a bit is 1 for one of these, it's 0 for the others. So:

packed_info = (age << 8) | (gender << 7) | height = AAAAAAAGHHHHHHH

Now we want to unpack the bits. Let's start with the height. We want to get the last 7 bits, and ignore the first 8. To do this, we use the & operator, which returns 1 only if both of the input bits are 1. So:

0011 & 0101 = 0001

So:

packed_info          = AAAAAAAGHHHHHHH
0x7F                 = 000000001111111
(packed_info & 0x7F) = 00000000HHHHHHH = height

To get the age, we can just push everything 8 places to the right, and we're left with 0000000AAAAAAAA. So age = (packed_info >> 8).

Finally, to get the gender, we push everything 7 places to the right to get rid of the height. We then only care about the last bit:

packed_info            = AAAAAAAGHHHHHHH
(packed_info >> 7)     = 0000000AAAAAAAG
1                      = 000000000000001
(packed_info >> 7) & 1 = 00000000000000G


This could be a rather long lesson in bit manipulation but first let me point you too the bit masking article on Wikipedia.

packed_info = (age << 8) | (gender << 7) | height;

Take age and move it's value over 8 bits then take gender and move it over 7 bits and height will occupy the last bits.

age    = 0b101
gender = 0b1
height = 0b1100
packed_info = 0b10100000000
            | 0b00010000000
            | 0b00000001100
/* which is */
packed_info = 0b10110001100

Unpacking does the reverse but uses masks like 0x7F (which is 0b 01111111) to trim out the other values in the field.

gender = (packed_info >> 7) & 1;

Would work like...

gender = 0b1011 /* shifted 7 here but still has age on the other side */
       & 0b0001
/* which is */
gender = 0b1

Note that ANDing anything to 1 is the same as "keeping" that bit and ANDing with 0 is the same as "ignoring" that bit.


If you were going to store a date as a number, maybe you would accomplish it by multiplying the year by 10000, the month by 100 and adding the day. A date such as July, 2, 2011 would be encoded as the number 20110702:

    year * 10000 + month * 100 + day -> yyyymmdd
    2011 * 10000 + 7 * 100 + 2 -> 20110702

We can say that we encoded the date in a yyyymmdd mask. We could describe this operation as

  • Shift the year 4 positions to the left,
  • shift the month 2 positions to the left and
  • leave the day as is.
  • Then combine the three values together.

This is the same thing that is happenning with the age, gender and height encoding, only that the author is thinking in binary.

See the ranges that those values may have:

    age: 0 to 127 years
    gender: M or F
    height: 0 to 127 inches

If we translate those values to binary, we would have this:

    age: 0 to 1111111b (7 binary digits, or bits)
    gender: 0 or 1 (1 bit)
    height: 0 to 1111111b (7 bits also)

With this in mind, we can encode the age-gender-height data with the mask aaaaaaaghhhhhhh, only that here we are talking about binary digits, not decimal digits.

So,

  • Shift the age 8 bits to the left,
  • shift the gender 7 bits to the left and
  • leave the height as is.
  • Then combine all three values together.

In binary, the Shift-Left operator (<<) moves a value n positions to the left. The "Or" operator ("|" in many languages) combines values together. Therefore:

    (age << 8) | (gender << 7) | height

Now, how to "decode" those values?

It's easier in binary than in decimal:

  • You "mask away" the height,
  • shift the gender 7 bits to the right and mask that away also, and finally
  • shift the age 8 bits to the right.

The Shift-Right operator (>>) moves a value n positions to the right (whatever digits shifted "out" of the rightmost position are lost). The "And" binary operator ("&" in many languages) masks bits. To do that it needs a mask, indicating which bits to preserve and which bits to destroy (1 bits are preserved). Therefore:

    height = value & 1111111b (preserve the 7 rightmost bits)
    gender = (value >> 1) & 1 (preserve just one bit)
    age = (value >> 8)

Since 1111111b in hex is 0x7f in most languages, that's the reason for that magic number. You would have the same effect by using 127 (which is 1111111b in decimal).


Same requirement I have faced many times. It is very easy with the help of Bitwise AND operator. Just qualify your values with increasing powers of two(2). To store multiple values, ADD their relative number ( power of 2 ) and get the SUM. This SUM will consolidate your selected values. HOW ?

Just do Bitwise AND with every value and it will give zero (0) for values which were not selected and non-zero for which are selected.

Here is the explanation:

1) Values ( YES, NO, MAYBE )

2) Assignment to power of two(2)

YES   =    2^0    =    1    =    00000001
NO    =    2^1    =    2    = 00000010
MAYBE =    2^2    =    4    = 00000100

3) I choose YES and MAYBE hence SUM:

SUM    =    1    +    4    =    5

SUM    =    00000001    +    00000100    =    00000101 

This value will store both YES as well as MAYBE. HOW?

1    &    5    =    1    ( non zero )

2    &    5    =    0    ( zero )

4    &    5    =    4    ( non zero )

Hence SUM consists of

1    =    2^0    =    YES
4    =    2^2    =    MAYBE.

For more detailed explanation and implementation visit my blog


A more condense answer:

AAAAAAA G HHHHHHH

Packing:

packed = age << 8 | gender << 7 | height

Alternatively you can just sum components if ie when used in MySQL SUM aggregate function

packed = age << 8 + gender << 7 + height

Unpacking:

age = packed >> 8 // no mask required
gender = packed >> 7 & ((1 << 1) - 1) // applying mask (for gender it is just 1)
height = packed & ((1 << 7) - 1) // applying mask


Another (longer) example:

Say you have an IP address you want to pack, however it is a fictional IP address eg 132.513.151.319. Note that some components greater then 256 which requires more then 8 bits unlike real ip addresses.

First we need to figure out what offset we need to use to be able to store the max number. Lets say with our fictional IPs no component can be bigger then 999 that means we need 10 bits of storage per component (allows numbers up to 1014).

packed = (comp1 << 0 * 10) | (comp1 << 1 * 10) | (comp1 << 2 * 10) | (comp1 << 3 * 10)

Which gives dec 342682502276 or bin 100111111001001011110000000010010000100

Now lets unpack the value

comp1 = (packed >> 0 * 10) & ((1 << 10) - 1) // 132
comp2 = (packed >> 1 * 10) & ((1 << 10) - 1) // 513
comp3 = (packed >> 2 * 10) & ((1 << 10) - 1) // 151
comp4 = (packed >> 3 * 10) & ((1 << 10) - 1) // 319

Where (1 << 10) - 1 is a binary mask we use to hide bits on the left beyond the 10 right most bits we are interested in.

Same example using MySQL query

SELECT

(@offset := 10) AS `No of bits required for each component`,
(@packed := (132 << 0 * @offset) | 
            (513 << 1 * @offset) | 
            (151 << 2 * @offset) | 
            (319 << 3 * @offset)) AS `Packed value (132.513.151.319)`,

BIN(@packed) AS `Packed value (bin)`,

(@packed >> 0 * @offset) & ((1 << @offset) - 1) `Component 1`,
(@packed >> 1 * @offset) & ((1 << @offset) - 1) `Component 2`,
(@packed >> 2 * @offset) & ((1 << @offset) - 1) `Component 3`,
(@packed >> 3 * @offset) & ((1 << @offset) - 1) `Component 4`;


The left shift operator means "multiply by two, this many times". In binary, multiplying a number by two is the same as adding a zero to the right side.

The right shift operator is the reverse of the left shift operator.

The pipe operator is "or", meaning overlay two binary numbers on top of each other, and where there is a 1 in either number the result in that column is a 1.

So, let's extract the operation for packed_info:

// Create age, shifted left 8 times:
//     AAAAAAA00000000
age_shifted = age << 8;

// Create gender, shifted left 7 times:
//     0000000G0000000
gender_shifted = gender << 7;

// "Or" them all together:
//     AAAAAAA00000000
//     0000000G0000000
//     00000000HHHHHHH
//     ---------------
//     AAAAAAAGHHHHHHH
packed_info = age_shifted | gender_shifted | height;

And the unpacking is the reverse.

// Grab the lowest 7 bits:
//     AAAAAAAGHHHHHHH &
//     000000001111111 =
//     00000000HHHHHHH
height = packed_info & 0x7F;

// right shift the 'height' bits into the bit bucket, and grab the lowest 1 bit:
//     AAAAAAAGHHHHHHH 
//   >> 7 
//     0000000AAAAAAAG &
//     000000000000001 =
//     00000000000000G
gender = (packed_info >> 7) & 1;

// right shift the 'height' and 'gender' bits into the bit bucket, and grab the result:
//     AAAAAAAGHHHHHHH 
//   >> 8
//     00000000AAAAAAA
age    = (packed_info >> 8);


You can see the expression x & mask as an operation that removes from x the bits that are not present (i.e., have value 0) in mask. That means, packed_info & 0x7F removes from packed_info all the bits that are above the seventh bit.

Example: if packed_info is 1110010100101010 in binary, then packed_info & 0x7f will be

1110010100101010
0000000001111111
----------------
0000000000101010

So, in height we get the lower 7 bits of packed_info.

Next, we are shifting the whole packed_info by 7, this way we remove the information which we have already read out. So we get (for the value from previous example) 111001010 The gender is stored at the next bit, so with the same trick: & 1 we are extracting only that bit from the information. The rest of the information is contained at offset 8.

Packing back is not complicated, too: you take age, shift it 8 bits (so you get 1110010100000000 from 11100101), shift the gender by 7 (so you get 00000000), and take the height (assuming it would fit lower 7 bits). Then, you are composing all of them together:

1110010100000000
0000000000000000
0000000000101010
----------------
1110010100101010
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号