Recently on one of the IRC channels we've been discussing simple functions to make a string with binary representation of a 8-bit unsigned integer. For some reason I decided to go beyond the standard "make a loop, read a bit, and put either '0' or '1' in the output char array", and try to create a "loopless", more-math-looking function.



(Disclaimer: It's not faster. It's not better. It's not production-quality code. It's just is and it was fun to make ;>)



UPDATE (next morning): I've simplified the function a little, so that it uses the "c" variable only once. The previous version can be found at the end of the post. I've also update the description.

UPDATE 2 (next morning): If you take the constants (both in decimal and hexadecimal form) and google them, you'll find some prior art (as expected of course) and some interesting variations of the idea :)



The exact assumptions where:

• always 8 digits (so 0x00 should be "00000000", and not "0")

• don't add \0 - it will be added elsewhere in the code

• has to work on x86



This is what I ended up with:



void to_bin(unsigned char c, char *out) {

*(unsigned long long*)out = 3472328296227680304ULL +

(((c * 9241421688590303745ULL) / 128) & 72340172838076673ULL);

}



And yes, it works:

00: 00000000

01: 00000001

02: 00000010

03: 00000011

04: 00000100

...

5d: 01011101

5e: 01011110

5f: 01011111

60: 01100000

...

fd: 11111101

fe: 11111110

ff: 11111111



How does it work?

I think it's easier to start from the beginning, and explain my thought process and how I got to the above function, then trying to backtrack from the above monstrum.



The starting point for me was the string "00000000". Having that string you just have to increment each byte by one for each bit which is set. To put it another way: if you add the bit from the Nth position to the Nth byte of the "00000000" string, you will get the binary representation of the number in a string (btw, I'm indexing the bytes starting from the last one, so it's same as the bits indexes: 76543210).



So what I ended up needing here was a way to "unpack" the bits of the input byte to a 64-bit (8-byte) unsigned integer (so I can add that integer to the "00000000"):



Input bits:

b 7 b 6 b 5 b 4 b 3 b 2 b 1 b 0



Output bytes (note: x86 is little endian - that's why it's backwards vs what I said above... so it's actually forward):

0x0b 0 0b 1 0b 2 0b 3 0b 4 0b 5 0b 6 0b 7



Now that question was: how to create such an "unpack" function? I decided to start with something simple - just make copies of the input byte (let's call it c) to all of the 8 bytes, then do some unknown magic, and then in the end apply a mask 0x0101010101010101. Let's focus on the byte duplication for now.



c * 0x0101010101010101ULL

After this operation you basically get: cccccccc (each c being a byte).



But that is of course not what I needed. The best outcome for me would be to have Nth bit of the input at 0-th bit in Nth byte of the output, and I-don't-care-what could be everywhere else, since I plan to apply the 0x0101010101010101 in the end anyway.



I started with doing this bit-by-bit and decided to try and combine it later:

So how to put b 0 at the highest byte? That's still simple - you just do c * 0x0100000000000000 and then apply the aforementioned binary mask.

What about b 1 at the second highest byte? Well, you can do the same - c * 0x0001000000000000 - and then shift the result by one to the left. Well actually you can just shift the multiply-constant (0x0001000000000000), and multiply by that (0x0000800000000000).

The next value would go to the 3rd highest byte, and be shifted by 2 bits - so the constant would be 0x0000004000000000.

And so on...



To combine it, just sum up the constant. The final form (including the mask) looks like this:



(c * 0x0100804020100804) & 0x0101010101010101

But wait! There are 8 bits in an 8-bit number, but only 7 bits are set in this constant we use! What happened to the highest bit?



Well, since the distance between the set bits is 9, I start at bit 56, and there are only 64 bits you can use... there is not enough space to handle the last bit - you can only unpack 7 bits using this method. But no worries, you can always add the last bit after the mask is applied:



((c * 0x0100804020100804) & 0x0101010101010101) + (c >> 7)

And this is the final unpacking function.



UPDATE

Actually the above is true only if you start from the 56-bit and go down. If you shift the the constant by 7 bits to the right (so in the case of b 0 the partial-constant is 0x0100000000000000 << 7 → 0x8000000000000000), there is just enough space to fit all the 8-bits there. You just have to remember shift it left by 7 later (or divide by 128):



(((c * 0x8040201008040201) / 128) & 0x0101010101010101)

Note that the least significant bit copy (it's the 0x8000000000000000 part in the constant) is truncated in such a way, that only the least significant bit actually fits in the 64-bit number (at the 63-bit). The rest of it is on bits 69 to 64, which are sliced off of course.

END OF UPDATE



Once I had it, I just needed to add the result to "00000000" interpreted as an unsigned 64-bit integer - 0x3030303030303030 - and save the result in your output char array, and that's it.



Of course, the last thing I did was changing the constants to decimal - they look way more scary this way ;>



Pre-update version:



void to_bin(unsigned char c, char *out) {

*(unsigned long long*)out = 3472328296227680304ULL +

((c * 72198606942111748ULL) & 72340172838076673ULL) + c / 128;

}

