Eight C++ programming mistakes the compiler won’t catch



C++ is a complex language, full of subtle traps for the unwary. There is an almost infinite number of ways to screw things up. Fortunately, modern compilers are pretty good at detecting a large number of these cases and notifying the programmer via compile errors or warnings. Ultimately, any error that is compiler-detectable becomes a non-issue if properly handled, as it will be caught and fixed before the program leaves development. At worst, a compiler-detectable error results in lost time while the programmer searches for a solution or workaround.

The dangerous errors are the ones that compilers are unable to detect. These errors are much less likely to be noticed, and can cause severe consequences such as incorrect outputs, corrupted data, and/or program crashes. As the size of a programming project increases, the complexity of the logic and large number of paths of execution can help obscure these bugs, causing them to appear only intermittently, making them particularly hard to track down and debug. Although this list may be mostly review to experienced programmers, the consequences of making one of these errors is also amplified due to the scale and commercial nature of the projects experienced programmers tend to work on.

These examples were all tested using Visual Studio 2005 Express using the default warning level. Your results may vary on other compilers. I highly recommend all programmers use the highest warning level possible!. Some items that may not be flagged as a potential problem at the default warning level may be caught at the highest warning level!

(Note: This article is part 1 in an intended series of articles)





1) Uninitialized variables

Uninitialized variables are one of the most devious mistakes commonly made in C++. Memory allocated to a variable in C++ is not cleared or zeroed upon allocation. Consequently, an uninitialized variable will have some value, but there is no way to predict what that value will actually be. Furthermore, the value of the variable may change each time the program is executed. This can result in intermittent problems, which are particularly hard to track down. Consider the following snippet:

if (bValue) // do A else // do B

If bValue is uninitialized, it could evaluate to either true or false, and either branch could be taken.

In some basic cases, the compiler will be able to inform you of an uninitialized variable. The following causes a compiler warning on most compilers:

int foo() { int nX; return nX; }

However, other simple cases generally do not produce warnings:

void increment(int &nValue) { ++nValue; } int foo() { int nX; increment(nX); return nX; }

The above may not produce a warning because the compiler typically doesn’t keep track of whether increment() assigns a value to nValue.

Uninitialized variables are even more likely to appear in classes, where member declaration are generally separated by from the constructor implementation:

class Foo { private: int m_nValue; public: Foo(); int GetValue() { return m_nValue; } }; Foo::Foo() { // Oops, we forget to initialize m_nValue } int main() { Foo cFoo; if (cFoo.GetValue() > 0) // do something else // do something else }

Note that m_nValue is never initialized. Consequently, GetValue() returns a junk value, and either branch may be executed.

New programmers often make the following mistake when declaring multiple variables:

int nValue1, nValue2 = 5;

The assumption being made here is that 5 is assigned to both nValue1 and nValue2, when in fact the value of 5 is only assigned to nValue2, and nValue1 is uninitialized.

Because uninitialized variables can evaluate to any value, which can cause the program to exhibit different behavior each time it is run, problems caused by uninitialized variables are particularly hard to find. One run, the program may work fine. The next time, it may crash. The after that, it may produce the wrong output.

To compound problems finding uninitialized variables, variable declared when running the program in a debugger are typically zeroed. This means your program may work fine every time when run in a debugger, but crash intermittently in release mode! If this is the case, an uninitialized variable is often the root of your problem.

2) Integer division

Most binary operators in C++ require both operands to be the same type. If the operands are of different types, one of the operands is promoted to match the type of the other.

In C++, the division operator can be thought of as two different operators: one that works on integer operands, and one that works on floating point operands. If the operands are of a floating point type, the division operator will return a floating point value:

float fX = 7; float fY = 2; float fValue = fX / fY; // fValue = 3.5

If the operands are of an integer type, the division operator will drop any faction and return an integer value:

int nX = 7; int nY = 2; int nValue = nX / nY; // nValue = 3

If one operand is an integer, and the other is a floating point value, the integer value will be promoted to a floating point type:

float fX = 7.0; int nY = 2; float fValue = fX / nY; // nY is promoted to float, floating point division used // fValue = 3.5

Many new programmers attempt to do the following:

int nX = 7; int nY = 2; float fValue = nX / nY; // fValue = 3 (not 3.5!)

The underlying assumption here is that nX / nY will result in a floating point division because the result is being assigned to a floating point value. However, this is not the case. nX / nY is evaluated first, resulting in an integer value, which is then promoted to a float and assigned to fValue. However, by that point, the fraction has already been lost.

In order to force two integers to use floating point division, one of the values should be cast to a floating point value:

int nX = 7; int nY = 2; float fValue = static_cast<float>(nX) / nY; // fValue = 3.5

Because nX is being explicitly cast to a float, nY will be implicitly promoted to a float, which will cause the division operator to perform floating point division, resulting in a value of 3.5.

It is often hard to tell at a glance whether a given division operation is performing integer or floating point division:

z = x / y; // is this integer or floating point division?

However, using Hungarian Notation can help disambiguate the case and help prevent mistakes:

int nZ = nX / nY; // integer division double dZ = dX / dY; // floating point division

One other interesting issue with integer division is that C++ does not define how to truncate the result when one operand is negative. Consequently, the compiler is free to truncate up or down! For example, -5 / 2 can evaluate to either -3 or -2, depending on whether the compiler rounds down or rounds toward 0. Most modern compilers round towards 0.

3) = vs ==

This one is an oldie but a goodie. Many beginning C++ programmers confuse the meaning of the assignment operator (=) with the equality operator (==). But even programmers who know the difference can make a typo that will have unintended results:

// if nValue is 0, return 1, otherwise return nValue int foo(int nValue) { if (nValue = 0) // TYPO! return 1; else return nValue; } int main() { std::cout << foo(0) << std::endl; std::cout << foo(1) << std::endl; std::cout << foo(2) << std::endl; return 0; }

Function foo() is intended to return 1 if nValue is 0, otherwise return nValue. But due to inadvertently using the assignment operator instead of the equality operator, the program produces an unexpected result:

0 0 0

When the if statement in foo() is evaluated, nValue is assigned the value of 0. if (nValue = 0) evaluates the same way that nValue = 0; if (nValue) evaluates. Consequently, the if condition is false, which causes the else statement to return nValue, which was just assigned the value of 0!

Consequently, this function always returns 0.

Running a modern compiler at the highest warning level will cause it to issue a warning when an assignment is used in a conditional statement, or a note that the statement does nothing when an equality test is used instead of an assignment outside of a conditional. This is one issue that is essentially fixable -- if you use the higher warning levels.

4) Mixing signed and unsigned values

As mentioned in the section on integer division, most binary operators in C++ require both operands to be the same type. If the operands are of different types, one of the operands is promoted to match the type of the other.

This can lead to some very unexpected results when mixing signed and unsigned values! Consider the following case:

cout << 10 - 15u; // 15u is unsigned

One would expect the answer to be -5. However, because 10 is a signed integer and 15 is an unsigned integer, the type promotion rules come into effect here. The hierarchy used for type promotion in C++ looks like this:

long double (highest)

double

float

unsigned long int

long int

unsigned int

int (lowest)

Because the int operand is considered lower than the unsigned int operand, the int is promoted to an unsigned int. Fortunately, 10 is already a positive number, so the promotion does not cause our number to be interpreted any differently.

Thus, we effectively have:

cout << 10u - 15u

Here’s where the tricky part happens. Because both variables are unsigned integers, the result of the operation is also an unsigned integer! 10u - 15u = -5u. But unsigned variables can not hold negative numbers, and thus the -5 is interpreted as 4,294,967,291 (assuming 32 bit integers).

Consequently, the following program:

cout << 10 - 15u; // 15u is unsigned

prints 4,294,967,291, not -5.

This situation can come up in more obscure forms:

int nX; unsigned int nY; if (nX - nY < 0) // do something

Due to the type conversion, this if statement will always evaluate to false, which is clearly not what the programmer intends!

5) delete vs. delete[]

Many C++ programmers forget that there are actually two forms of both the new and delete operators: a scalar version, and an array version.

Operator new is used to allocate scalar (non-array) data on the heap. If the object being allocated is a class type, the object’s constructor is called.

Foo *pScalar = new Foo;

The delete operator is used to destroy a scalar object that has been allocated using the new operator. If the object being destroyed is a class type, the object’s destructor is called.

delete pScalar;

Now consider the following snippet:

Foo *pArray = new Foo[10];

This snippet allocates an array of 10 Foo. Because the subscript [10] is placed after the int type specifier, many C++ programmers do not realize that operator new[] is being called to do the array allocation instead of operator new. Operator new[] ensures that the constructor is called for each object being constructed.

Conversely, to delete an array, the delete[] operator should be used:

delete[] pArray;

This ensures that the destructor for each object in the array is called.

If the delete operator is used on an array, only the first object will be destructed, and heap corruption can result!

6) Side effects in compound expressions or function calls

A side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.

Side effects can often be useful:

x = 5;

The assignment operator has the side effect of changing the value of x permanently. Other C++ operators with useful side effects include *=, /=, %=, +=, -=, <<=, >>=, &=, |=, ^=, and the infamous ++ and -- operators.

However, there are several places in C++ where the order of operations is undefined, and these can lead to inconsistent behavior. For example:

int multiply(int x, int y) { return x * y; } int main() { int x = 5; std::cout << multiply(x, ++x); }

Because the order of evaluation of the function parameters for multiply() is undefined, this could print 30 or 36, depending on whether x or ++x is evaluated first.

A slightly stranger example involving operators:

int foo(int x) { return x; } int main() { int x = 5; std::cout << foo(x) * foo(++x); }

Because the order of evaluation of the operands of C++ operators is undefined (for most operators -- there are a few exceptions), this could also print 30 or 36, depending on whether the left or right operand is evaluated first.

Also consider the following compound expression:

if (x == 1 && ++y == 2) // do something

The intent of the programmer is probably to say “if x is 1 and the pre-incremented value of y is 2, then do something”. However, if x does not equal 1, C++ uses short-circuit evaluation, which means that ++y never gets evaluated! Thus, y will only be incremented if x evaluates to 1, which is probably not what the programmer intended!

A good rule of thumb is to put any operator that causes a side effect in it’s own statement.

7) Switch statements without break

Another classic mistake that new programmers make is forgetting to use break to end a switch block:

switch (nValue) { case 1: eColor = Color::BLUE; case 2: eColor = Color::PURPLE; case 3: eColor = Color::GREEN; default: eColor = Color::RED; }

When the switch expression evaluates to the same value as the case label expression, execution starts at the matching case statement. Execution then continues until either the end of the switch block is reached, or a return, goto, or break statement is executed. Any other labels are ignored!

Consider what happens if nValue is 1 in the above program. Case 1 matches, so eColor is set to Color::BLUE. Evaluation proceeds to the next statement, which sets eColor to Color::PURPLE. The next statement sets it to Color::GREEN. And finally, it gets set to Color::RED.

In fact, this snippet ends up setting eColor to COLOR::RED no matter what the value of nValue is!

The correct way to write the above program is:

switch (nValue) { case 1: eColor = Color::BLUE; break; case 2: eColor = Color::PURPLE; break; case 3: eColor = Color::GREEN; break; default: eColor = Color::RED; break; }

The break terminates the case statement, thus causing eColor to retain the value that the programmer intended.

Although this is very basic switch/case logic, it is very easy to miss a break statement and end up with inadvertent fall-through.

8) Calling virtual functions in constructors

Consider the following program:

class Base { private: int m_nID; public: Base() { m_nID = ClassID(); } // ClassID returns a class-specific ID number virtual int ClassID() { return 1; } int GetID() { return m_nID; } }; class Derived: public Base { public: Derived() { } virtual int ClassID() { return 2; } }; int main() { Derived cDerived; cout << cDerived.GetID(); // prints 1, not 2! return 0; }

In this program, the programmer has called a virtual function inside the constructor of a base class, expecting it to resolve to Derived::ClassID(). It doesn’t -- and consequently, the program prints 1 instead of 2.

When an class that has been derived from a base classes is instantiated, the base class object is constructed before the derived class object. This is done because the derived class members may be dependent upon members of the base class already being initialized. Consequently, when the base object constructor is being executed, there is no derived object! It hasn’t been created yet. Thus, any call to a virtual function can only resolve to the level of the base class, not the derived class.

As pertains to this example, when the Base portion of cDerived is being constructed, the Derived portion does not exist yet. Thus, the function call to ClassID() resolves to Base::ClassID() (not Derived::ClassID()), which sets m_nID to 1.

Once the Derived portion of cDerived has been constructed, any calls to ClassID() on this object will resolve to Derived::ClassID() as anticipated.

Note that some other programming languages (such as C# and Java) will resolve virtual function calls to the most derived class even if the derived class has not been initialized yet! C++ differs in this regard, and does so for the programmer’s safety. That is not to say one way is necessarily better than the other, but merely to denote that different languages may have different behaviors.

Conclusion

As this is the first article in this series, I thought it appropriate to start with some of the more basic issues that new programmers will encounter. Future articles in this series will tackle programming mistakes of an increasingly complex nature.

Regardless of a programmer’s experience level, mistakes happens, whether through lack of knowledge, a typo, or general carelessness. Being aware of which issues are most likely to cause trouble can help reduce the probability that they will cause trouble. While there is no substitute for experience and knowledge, good unit testing can help catch many of these before they get buried under layers of other code!

Related articles: