June 11, 2010 at 06:17 Tags C & C++

C and C++ frequently coax you into using an unsigned type for iteration. Standard functions like strlen and the size method of containers (in C++) return size_t , which is an unsigned type, so to avoid conversion warnings you comply and iterate with a variable of the appropriate type. For example:

size_t len = strlen(some_c_str); size_t i; for (i = 0 ; i < len; ++i) { /* Do stuff with each char of some_c_str */ }

I've long been aware of one painful gotcha of using size_t for iteration - using it for iterating backwards. The following code will fail:

/* Warning: buggy code! */ size_t len = strlen(some_c_str); size_t i; for (i = len - 1 ; i >= 0 ; --i) { /* Do stuff with each char of some_c_str, backwards */ }

When i reaches 0 it's still within bounds, so it will be decremented and become a huge positive number (probably 2^((sizeof(size_t)*8) - 1 ). Congratulations, we have an infinite loop.

Today I ran into another manifestation of this problem. This one is more insidious, because it happens only for some kinds of input. I wrote the following code because the operation had to consider each character in the string and the character after it:

/* Warning: buggy code! */ size_t len = strlen(some_c_str); size_t i; for (i = 0 ; i < len - 1 ; ++i) { /* Do stuff with some_c_str[i] and some_c_str[i+1]. */ }

Can you spot the bug?

When some_c_str is empty, len is 0. Therefore, i is compared with the unsigned version of -1, which is that huge positive number again. What chance does poor i have against such a giant? It will just keep chugging along, well beyond the length of my string.

As I see it, to avoid the problem we can either:

Use an int variable and cast the return value of strlen to int . This feels a bit dirty, especially in C++ where you'd have to use static_cast<int> . Just keep using unsigned types for iteration, but be extra careful and use various hacks to avoid the problematic corner cases.

None of these options is ideal, so if you have a better idea, let me know.

Thanks everyone for the excellent comments! It's obvious creative ways exist to overcome this problem for unsigned types. Still, it remains a gotcha even seasoned programmers stumble upon from time to time. It's not surprising that many C/C++ style guides recommend keeping unsigned types for bitfields only, using plain ints for everything else.