An example of a subtle bug that you can run into in C and C++: accidentally overwriting another local variable. Such an issue can manifest only on certain systems, and can be hard to debug if you have not seen it before. An oldie but goodie.



The Misbehaving Code

Consider the following, artificially made-up piece of code:

#include "thirdparty.h" #include <iostream> #include <memory> int main() { auto p = std::make_unique<int>(1); int res; thirdparty_process("whatever", &res); std::cout << "res: " << res << '

'; }

As its name suggests, the thirdparty_process() function comes from a third-party C library. It is supposed to do some work based on the first parameter, and store the result into res . Variable p is an arbitrary variable that, believe it or not, will play a role in our example.

If you compile the example and run it, it may or may not crash, depending on the used operating system, compiler, optimizations etc. For example, here is an output from a run on my 64b Arch Linux with GCC 9.2:

$ g++ -std=c++17 -pedantic -Wall -Wextra -g -o example example.cpp thirdparty.o $ ./example res: 1 Segmentation fault (core dumped)

Huh. In our main() , there is nothing after the print. Why does it segfault? There were not even any warnings. Let’s try to run the code via valgrind:

$ valgrind ./example ==290366== Memcheck, a memory error detector ==290366== res: 1 ==290366== [..] ==290366== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Hmm, the program runs fine. Let’s try to compile it with Clang and run it:

$ clang++ -std=c++17 -pedantic -Wall -Wextra -g -o example example.cpp thirdparty.o $ ./example res: 1

Runs just fine… Strange. Anyway, let’s go back to the GCC build and run it via gdb:

$ gdb -quiet ./example Reading symbols from ./example... (gdb) run res: 1 Program received signal SIGSEGV, Segmentation fault. 0x00007ffff7b08470 in free () from /usr/lib/libc.so.6

Alright, let’s see where we are:

(gdb) backtrace #0 0x00007ffff7b08470 in free () from /usr/lib/libc.so.6 #1 0x0000555555555412 in std::default_delete<int>::operator() (this=0x7fffffffdbf0, __ptr=0x555500000000) at /usr/include/c++/9.2.0/bits/unique_ptr.h:81 #2 0x0000555555555362 in std::unique_ptr<int, std::default_delete<int> >::~unique_ptr (this=0x7fffffffdbf0, __in_chrg=<optimized out>) at /usr/include/c++/9.2.0/bits/unique_ptr.h:284 #3 0x0000555555555231 in main () at example.cpp:7

The code crashes during the destruction of p , our unique pointer. More specifically, it crashes whilst trying to delete the allocated memory. However, we have not touched that unique pointer! Or have we?

Why Is the Program Crashing?

To see what is wrong with the code, we need to take a look at thirdparty_process() :

#include <stdarg.h> void thirdparty_process(const char* what, ...) { va_list argp; va_start(argp, what); // Here, the function should do what it is supposed to do based on the // value of `what`. For the purpose of illustration, assume that the // function only stores value 1 into a variable passed by a pointer. long *res = va_arg(argp, long*); *res = 1; va_end(argp); }

Do not worry if you have not seen a variadic function in C before. Basically, a variadic function allows you to pass a variable number (and types) of arguments into the function. Think of printf() . In C, variadic functions can sometimes be handy. However, they hinder the compiler, causing it not to warn you when there is a type mismatch.

The pros and cons of variadic functions aside, here is the relevant portion of the function:

long *res = va_arg(argp, long*); *res = 1;

We get the passed address and store it into a pointer to long , and then write 1 into it. However, if you remember correctly, we have passed an address of an int variable, not of a long . And this is the essential crux of the bug. The function expected the argument to be a pointer to long , but we have accidentally passed a pointer to int . The compiler was unable to warn us due to the indirect nature of parameter passing in variadic functions.

To elaborate, int and long are two different data types. On some systems (e.g. Windows), their size is identical (4 bytes). However, on other systems, such as Linux, int has 4 bytes while long has 8 bytes. If this is the case, *res = 1 will write 8 bytes to an address of a 4-byte variable. Half of the 8 bytes will thus overwrite memory that does not belong to the 4-byte variable. In our case, this overwrites a part of the unique pointer p . To refresh your memory, a unique pointer is essentially a class containing a pointer to the managed memory (8 bytes on a 64b architecture). If we overwrite a part of that pointer, it will point to a bogus location. An attempt of deleting the memory in std::unique_ptr ‘s destructor will cause the program to crash.

Here is an illustration:

Stack +--------+ | | +----> +--------+ | | 4B | p | +--------+ <----+ | | 4B | | +----> +--------+ | *(long*)(&res) res | | 4B | | +----> +--------+ <----+ | | +--------+

How To Fix the Bug?

By using correct types. If a function expects a pointer to long , we have to give it a pointer to long :

-int res; +long res; thirdparty_process("whatever", &res);

Concluding Remarks

The present blog post illustrates one of the reasons why C-like variadic functions are unfortunate. If we had a regular function taking a pointer to long and we passed it a pointer to int , the compiler would refuse to compile the program. However, with variadic functions, all bets are off. Luckily, in C++, we have variadic templates that are type-safe.

and we passed it a pointer to , the compiler would refuse to compile the program. However, with variadic functions, all bets are off. Luckily, in C++, we have variadic templates that are type-safe. If you encounter strange crashes without an apparent cause (e.g. as in our case), try to closely look around for type mismatches.

Such a bug may also manifest when accessing an out-of-bounds index of an array (buffer overflow).

One indicator of memory overwriting can sometimes be too-well-rounded memory addresses. Here is the gdb output that we saw earlier: #1 0x0000555555555412 in std::default_delete<int>::operator() (this=0x7fffffffdbf0, __ptr=0x555500000000) at /usr/include/c++/9.2.0/bits/unique_ptr.h:81 The value of __ptr is the address of the dynamically allocated memory. In our case, the address is 0x555500000000 . The trailing zeros were put there in thirdparty_process() when overwriting the 4-byte int variable with 8 bytes. If it were a correct run, the address might have been e.g. 0x555500ea7eb4 . Note that gdb has stripped leading zeros from the address, so the full 8-byte address is 0x0000555500000000 (half of the bytes were overwritten by zeros).

output that we saw earlier: Such bugs are very hard to discover during code review. When reviewing code that calls variadic functions, pay special attention to the used types to check if they correspond to the API documentation of the called function.

This bug can be considered to be an example of a heisenbug. In programming jargon, a heisenbug is a bug that seems to disappear or alter its behavior when one attempts to study it. For example, if we modify our code in an attempt to debug the issue (e.g. by adding another local variable), the program may overwrite different memory and may not crash.

Complete Source Code

The complete source code is available on GitHub. However, remember that the behavior depends on the used operating system, compiler, optimizations etc.

Discussion

Apart from comments below, this blog post has made its way to Hacker News and /r/cpp.