For code related to this question, I need to compute the following as fast as possible:

Given a 32 bit integer i, compute the position of the n-th least significant set bit. Both n and the result should be 0-indexed.

For example, given the number i = 11010110101 2 and n = 4, the desired number is 7 as the fourth set bit is at position 7: 11010110101.

Using the pdep instruction from the BMI2 instruction set extension for x86 and the commonly available __builtin_ctz() intrinsic function, this can be computed easily:

j = _pdep_u32(1 << n, i); return (__builtin_ctz(j));

However, many computers do not have the pdep instruction (and it's slow on AMD CPUs that do have it), rendering this approach non-portable. You can also compute such bit positions without pdep like this:

j = i; for (k = 0; k < n; k++) j &= j - 1; return (__builtin_ctz(j));

However, this is pretty slow.