Sometimes you need tricky operations on your binary data like counting bits, leading or trailing zeros and so on (for example, when you want to implement HAMT or sparse arrays). In Rust this is achieved by using methods like .count_ones() and .trailing_zeros() . The problem with those methods is that they are usually expanded in a huge pile of assembler code. But x86 (and some other architectures) have instructions to perform these counts (specifically, popcnt and tzcnt ) and they are really fast (1 cycle for execution and latency of 3 cycles). Today we will learn how to force Rust compiler to use those instructions and what are the possible pitfalls.

Originally published on my website.

Let's start with an example. Here's the code that finds population counts of random integers (on Rust Playground):



use rand :: prelude :: * ; // this is only to have this function separately in the asm output #[inline(never)] fn count ( a : u32 ) -> u32 { a .count_ones () } fn main () { let a : u32 = random (); println! ( "{}" , count ( a )); println! ( "{}" , count ( a + 1 )); }

And the count function looks like that:



playground::count: mov eax, edi shr eax and eax, 1431655765 sub edi, eax mov eax, edi and eax, 858993459 shr edi, 2 and edi, 858993459 add edi, eax mov eax, edi shr eax, 4 add eax, edi and eax, 252645135 imul eax, eax, 16843009 shr eax, 24 ret

Wow. A huge pile of instructions and magic numbers. This is clearly not something we would like to see, especially in performance-critical places. Let's make this a little bit better (on Rust Playground):



use rand :: prelude :: * ; // this is only to have this function separately in the asm output #[inline(never)] #[cfg_attr(target_arch = "x86_64" , target_feature(enable = "popcnt" ))] unsafe fn count ( a : u32 ) -> u32 { a .count_ones () } fn main () { let a : u32 = random (); println! ( "{}" , unsafe { count ( a ) }); println! ( "{}" , unsafe { count ( a + 1 ) }); }

And here is the assembly code of count :



playground::count: popcnt eax, edi ret

Only a single very fast instruction! And that this will work with all integer types. There is a pitfall though: Rust requires us to mark functions as unsafe when we use target_feature so it makes sense to make functions using those features as small as possible.

Also, you can do something similar with .trailing_zeros() or .leading_zeros() by using target_feature(enable = "bmi1") .

To find feature names you can refer to architecture-specific intrinsics

documentation.

That's all, hope you find it useful!