Undefined behavior

A bit of theory at first.

Undefined behavior is a property of certain programming languages (most prominent in C and C++) to produce a result in certain situations that depends on compiler implementation or specified optimization switches. In other words, the specification does not define the language's behavior in any possible situations but says: "at A condition, the result of B operation is undefined". It is considered a mistake to allow such a situation in your program even if it is executed well at some particular compiler. Such a program will not be crossplatform and may cause failures on a different computer, operating system and even at different compiler's settings.

A sequence point in programming is any point in a program where it is guaranteed that the side effects of all the previous calculations have already emerged while there are no side effects of the following calculations yet. To learn more about sequence points and cases of undefined behavior related to sequence points, see this post: http://www.viva64.com/en/t/0065/.

Example 1. Chromium project. Incorrect use of smart pointer.

void AccessibleContainsAccessible(...) { ... auto_ptr<VARIANT> child_array(new VARIANT[child_count]); ... }

The error was found through the V554 diagnostic: Incorrect use of auto_ptr. The memory allocated with 'new []' will be cleaned using 'delete'. interactive_ui_tests accessibility_win_browsertest.cc 171

This example demonstrates the case when using a smart pointer can cause undefined behavior. It may be expressed through heap damage, program crash, incomplete object destruction or any other failure. The error is this: memory is allocated by the new [] operator and released by the delete operator in the 'auto_ptr' class' destructor:

~auto_ptr() { delete _Myptr; }

To fix these issues, you should use a more appropriate class, for instance, boost::scoped_array.

Example 2. IPP Samples project. Classic Undefined behavior.

template<typename T, Ipp32s size> void HadamardFwdFast(...) { Ipp32s *pTemp; ... for(j=0;j<4;j++) { a[0] = pTemp[0*4] + pTemp[1*4]; a[1] = pTemp[0*4] - pTemp[1*4]; a[2] = pTemp[2*4] + pTemp[3*4]; a[3] = pTemp[2*4] - pTemp[3*4]; pTemp = pTemp++; ... } ... }

The error was found through the V567 diagnostic: Undefined behavior. The 'pTemp' variable is modified while being used twice between sequence points. me umc_me_cost_func.h 168

This is a classic example of undefined program behavior. It is this construct which is used to demonstrate Undefined behavior in various articles. It is unknown whether 'pTemp' will be incremented by one or not. Two actions of changing pTemp variable's value are located in one sequence point. It means that the compiler may create the following code:

pTemp = pTemp + 1;

pTemp = pTemp;

Or it may create another version of the code:

TMP = pTemp;

pTemp = pTemp + 1;

pTemp = TMP;

Which of the two code versions will be created depends on the compiler and optimization switches.

Example 3. Fennec Media Project project. Complex expression.

uint32 CUnBitArrayOld::DecodeValueRiceUnsigned(uint32 k) { ... while (!(m_pBitArray[m_nCurrentBitIndex >> 5] & Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31])) {} ... }

The error was found through the V567 diagnostic: Undefined behavior. The 'm_nCurrentBitIndex' variable is modified while being used twice at single sequence point. MACLib unbitarrayold.cpp 78

There are no sequence points between two instances of using the 'm_nCurrentBitIndex' variable. It means that the standard does not specify the moment when this variable is incremented. Correspondingly, this code may work differently depending on the compiler and optimization switches.

Example 4. Miranda IM project. Complex expression.

short ezxml_internal_dtd(ezxml_root_t root, char *s, size_t len) { ... while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') { ... }

The error was found through the V567 diagnostic: Undefined behavior. The 's' variable is modified while being used twice between sequence points.msne zxml.c 371

Prefix increment of the variable is used here. But it does not mean anything: it cannot be guaranteed that the 's' variable will be incremented before calling the strspn() function.

Errors relating to operation priorities.

To make understanding of examples easier, let's recall the operation priorities table.

[image7.png]

Figure 2 - Operation priorities in C/C++

Example 1. MySQL project. Priorities of ! and & operations.

int ha_innobase::create(...) { ... if (srv_file_per_table && !mysqld_embedded && (!create_info->options & HA_LEX_CREATE_TMP_TABLE)) { ... }

The error was found through the V564 diagnostic: The '&' operator is applied to bool type value. You've probably forgotten to include parentheses or intended to use the '&&' operator. innobase ha_innodb.cc 6789

The programmer wanted a part of the expression to check that a certain bit in the 'create_info->options' variable is equal to zero. But the priority of the '!' operation is higher than that of the '&' operation, that's why the expression works by this algorithm:

((!create_info->options) & HA_LEX_CREATE_TMP_TABLE) We should use additional parentheses if we want the code to work properly: (!(create_info->options & HA_LEX_CREATE_TMP_TABLE))

Or, what we find nicer, write the code in the following way:

((create_info->options & HA_LEX_CREATE_TMP_TABLE) == 0)

Example 2. Emule project. Priorities of * and ++ operations.

STDMETHODIMP CCustomAutoComplete::Next(..., ULONG *pceltFetched) { ... if (pceltFetched != NULL) *pceltFetched++; ... }

The error was found through the V532 diagnostic: Consider inspecting the statement of '*pointer++' pattern. Probably meant: '(*pointer)++'. emule customautocomplete.cpp 277

If 'pceltFetched' is not a null pointer, the function must increment the variable of the ULONG type this pointer refers to. The error is this: the priority of the '++' operation is higher than that of '*' operation (pointer dereferencing). The "*pceltFetched++;" line is identical to the following code:

TMP = pceltFetched + 1; *pceltFetched; pceltFetched = TMP;

Virtually it is just increment of the pointer. To make the code correct, we must add parentheses: "(*pceltFetched)++;".

Example 3. Chromium project. Priorities of & and != operations.

#define FILE_ATTRIBUTE_DIRECTORY 0x00000010 bool GetPlatformFileInfo(PlatformFile file, PlatformFileInfo* info) { ... info->is_directory = file_info.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY != 0; ... }

The error was found through the V564 diagnostic: The '&' operator is applied to bool type value. You've probably forgotten to include parentheses or intended to use the '&&' operator. base platform_file_win.cc 216

Programmers easily forget that the priority of the '!=' operation is higher than that of '&'. This is what happened in our case. As a result, we have the following expression:

info->is_directory = file_info.dwFileAttributes & (0x00000010 != 0);

Let's simplify the expression:

info->is_directory = file_info.dwFileAttributes & (true);

Let's simplify it once again:

info->is_directory = file_info.dwFileAttributes & 1;

It turns out that we have tested the first bit instead of the fifth bit. To fix this, we need to add parentheses.

Example 4. BCmenu project. IF and ELSE mixed up.

void BCMenu::InsertSpaces(void) { if(IsLunaMenuStyle()) if(!xp_space_accelerators) return; else if(!original_space_accelerators) return; ... }

The error was found through the V563 diagnostic: It is possible that this 'else' branch must apply to the previous 'if' statement. fire bcmenu.cpp 1853

This is not an error of operation priorities, but one relative to it. The programmer does not take into account that the 'else' branch refers to the nearest 'if' operator. We can see that the code justification as if it works by the following algorithm:

if(IsLunaMenuStyle()) { if(!xp_space_accelerators) return; } else { if(!original_space_accelerators) return; }

But actually it is equivalent to the following construct:

if(IsLunaMenuStyle()) { if(!xp_space_accelerators) { return; } else { if(!original_space_accelerators) return; } }

Example 5. IPP Samples project. Priorities of ?: and | operations.

vm_file* vm_file_fopen(...) { ... mds[3] = FILE_ATTRIBUTE_NORMAL | (islog == 0) ? 0 : FILE_FLAG_NO_BUFFERING; ... }

The error was found through the V502 diagnostic: Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '|' operator. vm vm_file_win.c 393

Depending on the 'islog' variable's value, the expression must be either equal to "FILE_ATTRIBUTE_NORMAL" or "FILE_ATTRIBUTE_NORMAL | FILE_FLAG_NO_BUFFERING". But it does not happen. Priority of the '?:' operation is lower than that of '|'. As a result, the code acts as follows:

mds[3] = (FILE_ATTRIBUTE_NORMAL | (islog == 0)) ? 0 : FILE_FLAG_NO_BUFFERING;

Let's simplify the expression:

mds[3] = (0x00000080 | ...) ? 0 : FILE_FLAG_NO_BUFFERING;

Since FILE_ATTRIBUTE_NORMAL equals 0x00000080, the condition is always true. It means that 0 will always be written into mds[3].

Example 6. Newton Game Dynamics project. Priorities of ?: and * operations.

dgInt32 CalculateConvexShapeIntersection (...) { ... den = dgFloat32 (1.0e-24f) * (den > dgFloat32 (0.0f)) ? dgFloat32 (1.0f) : dgFloat32 (-1.0f); ... }

The error was found through the V502 diagnostic: Perhaps the '?:' operator works in a different way than it was expected. The '?:' operator has a lower priority than the '*' operator. physics dgminkowskiconv.cpp 1061

The error in this code again relates to the low priority of the '?:' operation. The condition for the '?:' operator is expressed by a meaningless subexpression "dgFloat32 (1.0e-24f) * (den > dgFloat32 (0.0f))". Adding parentheses will solve the issue.

By the way, programmers often forget how cunning the '?:' operator is. Here is a post on this topic: "How to make fewer errors at the stage of code writing. Part N2".

Formatted output errors

Examples of these errors are boring and alike, so we will examine only a few samples. The point is that functions with a variable number of arguments accept actual arguments incompatible with the format string. Any programmer who uses such functions as printf() is familiar with this type of errors.

Example 1. ReactOS project. Incorrect printing of a WCHAR-character.

static void REGPROC_unescape_string(WCHAR* str) { ... default: fprintf(stderr, "Warning! Unrecognized escape sequence: \\%c'

", str[str_idx]); ... }

The error was found through the V576 diagnostic: Incorrect format. Consider checking the third actual argument of the 'fprintf' function. The char type argument is expected. regedit regproc.c 293

The fprinf() function must print a character of the char type. But the third argument is a character of the WCHAR type. The user will get an incorrectly generated message. To fix the code, we should replace '%c' with '%C' in the format string.

Example 2. Intel AMT SDK project. Character '%' missing.

void addAttribute(...) { ... int index = _snprintf(temp, 1023, "%02x%02x:%02x%02x:%02x%02x:%02x%02x:" "%02x%02x:02x%02x:%02x%02x:%02x%02x", value[0],value[1],value[2],value[3],value[4], value[5],value[6],value[7],value[8], value[9],value[10],value[11],value[12], value[13],value[14],value[15]); ... }

The error was found through the V576 diagnostic: Incorrect format. A different number of actual arguments is expected while calling '_snprintf' function. Expected: 18. Present: 19. mod_pvs mod_pvs.cpp 308

It is not easy to find an error here at first sight. However, the PVS-Studio analyzer does not get tired and notices that the function takes more actual arguments than specified in the format string. The reason is that the '%' character is missing in one place. Let's single out this fragment:

"%02x%02x:[HERE]02x%02x:%02x%02x:%02x%02x",

Example 3. Intel AMT SDK project. Unused argument.

bool GetUserValues(...) { ... printf("Error: illegal value. Aborting.

", tmp); return false; }

The error was found through the V576 diagnostic: Incorrect format. A different number of actual arguments is expected while calling 'printf' function. Expected: 1. Present: 2. RemoteControlSample remotecontrolsample.cpp 792

The error is this: the 'tmp' variable is not used in any way when printing the information message.

Example 4. G3D Content Pak project. Printing of meaningless data.

class Matrix3 { ... inline float* operator[] (int iRow) { ... }; void AnyVal::serialize(G3D::TextOutput& t) const { ... const Matrix3& m = *(Matrix3*)m_value; ... t.printf("%10.5f, %10.5f, %10.5f,

%10.5f, %10.5f, %10.5f,

%10.5f, %10.5f, %10.5f)", m[0, 0], m[0, 1], m[0, 2], m[1, 0], m[1, 1], m[1, 2], m[2, 0], m[2, 1], m[2, 2]); ... }

The error was found through the V520 diagnostic: The comma operator ',' in array index expression '[0, 0]'. graphics3D anyval.cpp 275

The program prints meaningless values instead of the matrix. You may write such a code when you work with different programming languages and sometimes forget how to access an item in a two-dimensional array in the C language.

Let's see how the 'm[0, 1]' expression works. At first, expression"0, 1" is calculated. The result of this expression is 1. Then the 'operator[]' function is called in the Matrix3 class. The function takes the actual argument 1 and returns the pointer to the first string in the matrix. It is the value of this pointer that will be printed by the 'printf()' function though it expects a value of the float-type.

This is the correct code:

t.printf("%10.5f, %10.5f, %10.5f,

%10.5f, %10.5f, %10.5f,

%10.5f, %10.5f, %10.5f)", m[0][0], m[0][1], m[0][2], m[1][0], m[1][1], m[1][2], m[2][0], m[2][1], m[2][2]);

Examples of misprints found in code

A lot of programming errors are caused by misprints. Most of these errors are quickly detected at the early stages of testing. But there are some defects of this kind that remain in code for a long time causing troubles both to programmers and users.

You can make these errors much fewer using the PVS-Studio analyzer. It will find them before testing starts, which will significantly reduce the cost of defect detection and elimination.

Example 1. Miranda IM project. Assignment inside IF.

void CIcqProto::handleUserOffline(BYTE *buf, WORD wLen) { ... else if (wTLVType = 0x29 && wTLVLen == sizeof(DWORD)) ... }

The error was found through the V560 diagnostic: A part of conditional expression is always true: 0x29. icqoscar8 fam_03buddy.cpp 632

Because of a misprint, there is an assignment taking place inside the condition of the 'if' operator. This is the correct condition: "if (wTLVType == 0x29 && wTLVLen == sizeof(DWORD))".

Example 2. ReactOS project. Assignment error.

BOOL WINAPI GetMenuItemInfoA(...) { ... mii->cch = mii->cch; ... }

The error was found through the V570 diagnostic: The 'mii->cch' variable is assigned to itself. user32 menu.c 4347

The value of the variable is assigned to itself. The programmer apparently intended to write it in this way: "mii->cch = miiW->cch;".

Example 3. Clang project. Object name misprinted.

static Value *SimplifyICmpInst(...) { ... case Instruction::Shl: { bool NUW = LBO->hasNoUnsignedWrap() && LBO->hasNoUnsignedWrap(); bool NSW = LBO->hasNoSignedWrap() && RBO->hasNoSignedWrap(); ... }

The error was found through the V501 diagnostic: There are identical sub-expressions 'LBO->hasNoUnsignedWrap ()' to the left and to the right of the '&&' operator. LLVMAnalysis instructionsimplify.cpp 1891

There is a misprint when using variables with similar names. In the first line, both LBO and RBO variables must be used. This is the correct code:

bool NUW = LBO->hasNoUnsignedWrap() && RBO->hasNoUnsignedWrap();

Example 4. Notepad++ project. Incorrect state test.

bool _isPointXValid; bool _isPointYValid; ... bool isPointValid() { return _isPointXValid && _isPointXValid; };

The error was found through the V501 diagnostic: There are identical sub-expressions to the left and to the right of the '&&' operator. _isPointXValid && _isPointXValid

The name '_isPointXValid' is used twice. The function must actually return this code: "_isPointXValid && _isPointYValid".

Example 5. StrongDC++ project. Unsuccessful check of \r

.

static void getContentLengthAndHeaderLength(...) { ... while(line[linelen] != '\r' && line[linelen] != '\r') ... }

The error was found through the V501 diagnostic: There are identical sub-expressions 'line [linelen] != '\r'' to the left and to the right of the '&&' operator. miniupnpc miniupnpc.c 153

Because of a misprint, presence of the '\r' character is checked twice. Actually presence of the '

' character must be checked too.

Example 6. G3D Content Pak project. A closing parenthesis in a wrong place.

bool Matrix4::operator==(const Matrix4& other) const { if (memcmp(this, &other, sizeof(Matrix4) == 0)) { return true; } ... }

The error was found through the V575 diagnostic: The 'memcmp' function processes '0' elements. Inspect the 'third' argument. graphics3D matrix4.cpp 269

One closing parenthesis is in a wrong place. It turns out that the size of the memory area being compared is calculated by the "sizeof(Matrix4) == 0" expression. This expression always has the 'false' result. Then 'false' turns into an integer value equal to 0. This is the correct code:

if (memcmp(this, &other, sizeof(Matrix4)) == 0) {

Example 7. QT project. Error of structure member copying.

PassRefPtr<Structure> Structure::getterSetterTransition(Structure* structure) { ... transition->m_propertyStorageCapacity = structure->m_propertyStorageCapacity; transition->m_hasGetterSetterProperties = transition->m_hasGetterSetterProperties; transition->m_hasNonEnumerableProperties = structure->m_hasNonEnumerableProperties; transition->m_specificFunctionThrashCount = structure->m_specificFunctionThrashCount; ... }

The error was found through the V570 diagnostic: The 'transition->m_hasGetterSetterProperties' variable is assigned to itself. QtScript structure.cpp 512

It is not easy to find an error looking at this code. But it is there. The field 'm_hasGetterSetterProperties' is copied into itself. This is the correct code:

transition->m_hasGetterSetterProperties = structure->m_hasGetterSetterProperties;

Example 8. Apache HTTP Server project. Extra sizeof operator.

PSECURITY_ATTRIBUTES GetNullACL(void) { PSECURITY_ATTRIBUTES sa; sa = (PSECURITY_ATTRIBUTES) LocalAlloc(LPTR, sizeof(SECURITY_ATTRIBUTES)); sa->nLength = sizeof(sizeof(SECURITY_ATTRIBUTES)); ... }

The error was found through the V568 diagnostic: It's odd that the argument of sizeof() operator is the 'sizeof (SECURITY_ATTRIBUTES)' expression. libhttpd util_win32.c 115

The field 'nLength' must contain the size of the 'SECURITY_ATTRIBUTES' structure. There is a misprint in the code: the 'sizeof' operator is used twice. As a result, the field 'nLength' stores a size of the 'size_t' type. This is the correct code:

sa->nLength = sizeof(SECURITY_ATTRIBUTES);

Example 9. FCE Ultra project. Double variable declaration.

int iNesSaveAs(char* name) { ... fp = fopen(name,"wb"); int x = 0; if (!fp) int x = 1; ... }

The error was found through the V561 diagnostic: It's probably better to assign value to 'x' variable than to declare it anew. Previous daclaration: ines.cpp, line 960. fceuxines.cpp 962

The 'x' variable must store information whether or not a file was opened successfully. Because of a misprint, a new variable named 'x' is created and initialized instead of assigning 1 to the existing variable. This is how the correct code must look:

if (!fp) x = 1;

Example 10. Notepad++ project. Using && operator instead of &.

TCHAR GetASCII(WPARAM wParam, LPARAM lParam) { ... result=ToAscii(wParam, (lParam >> 16) && 0xff, keys,&dwReturnedValue,0); ... }

The error was found through the V560 diagnostic: A part of conditional expression is always true: 0xff. notepadPlus babygrid.cpp 694

The "(lParam >> 16) && 0xff" expression is meaningless and is always equal to 1 (true). A misprint here is in using the '&&' operator instead of '&'.

Example 11. WinDjView project. Incomplete condition.

inline bool IsValidChar(int c) { return c == 0x9 || 0xA || c == 0xD || c >= 0x20 && c <= 0xD7FF || c >= 0xE000 && c <= 0xFFFD || c >= 0x10000 && c <= 0x10FFFF; }

The error was found through the V560 diagnostic: A part of conditional expression is always true: 0xA. WinDjView xmlparser.cpp 45 False

The IsValidChar function always returns 'true'. Comparison is missing in one place because of a misprint: "... || 0xA || ...".

Example 12. Fennec Media Project project. Extra semicolon.

int settings_default(void) { ... for(i=0; i<16; i++); for(j=0; j<32; j++) { settings.conversion.equalizer_bands.boost[i][j] = 0.0; settings.conversion.equalizer_bands.preamp[i] = 0.0; } }

The error was found through the V529 diagnostic: Odd semicolon ';' after 'for' operator. settings.c 483

All the C and C++ programmers know how dangerous an extra semicolon ';' is. Unfortunately, this knowledge does not prevent them from making such misprints. There is an extra semicolon after the first 'for' operator, which makes this program fragment unable to execute.

Example 13. QT project. Missing break operator.

int QCleanlooksStyle::pixelMetric(...) { ... case PM_SpinBoxFrameWidth: ret = 3; break; case PM_MenuBarItemSpacing: ret = 6; case PM_MenuBarHMargin: ret = 0; break; ... }

The error was found through the V519 diagnostic: The 'ret' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 3765, 3767. QtGui qcleanlooksstyle.cpp 3767

This is a classic error - 'break' is missing inside the 'switch' operator. I think you do not need any further comments here.

Example 14. Miranda IM project. Assignment instead of comparison.

int FindItem(...) { ... int ret; ret=FindItem(hwnd,dat,hItem, (struct ClcContact ** )&z, (struct ClcGroup ** )&isv,NULL); if (ret=0) {return (0);} ... }

The error was found through the V559 diagnostic: Suspicious assignment inside the condition expression of 'if' operator: ret = 0. clist_mw clcidents.c 179

There is a misprint inside the condition of the 'if' operator: '=' is written instead of '=='. The function will handle the situation incorrectly when a certain item is not found.

Example 15. IPP Samples project. Incorrect index.

struct AVS_MB_INFO { ... Ipp8u refIdx[AVS_DIRECTIONS][4]; ... }; void AVSCompressor::GetRefIndiciesBSlice(void){ ... if (m_pMbInfo->predType[0] & predType) { m_refIdx[iRefNum] = m_pMbInfo->refIdx[dir][0]; iRefNum += 1; } if (m_pMbInfo->predType[1] & predType) { m_refIdx[iRefNum] = m_pMbInfo->refIdx[dir][1]; iRefNum += 1; } if (m_pMbInfo->predType[2] & predType) { m_refIdx[iRefNum] = m_pMbInfo->refIdx[dir][2]; iRefNum += 1; } if (m_pMbInfo->predType[3] & predType) { m_refIdx[iRefNum] = m_pMbInfo->refIdx[dir][30]; iRefNum += 1; } ... }

The error was found through the V557 diagnostic: Array overrun is possible. The '30' index is pointing beyond array bound. avs_enc umc_avs_enc_compressor_enc_b.cpp 495

Consider this fragment: "m_pMbInfo->refIdx[dir][30]". Because of a misprint, number 30 is written instead of index 3. By the way, this sample shows well how relative our division of errors into categories is. This error might well be referred to the category "Errors of array and string handling". The division is relative and is made to show diversity of errors the PVS-Studio analyzer can detect.

Example 16. ReactOS project. Misprint in a macro.

#define SWAP(a,b,c) c = a;\ a = b;\ a = c

The error was found through the V519 diagnostic: The 'v2' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 343, 343. win32k gradient.c 343

It is a rather funny misprint in a macro intended to swap values in two variables. Look closely at the code and you will see what I mean. This is the correct code:

#define SWAP(a,b,c) c = a;\ a = b;\ b = c

This time we did not manage to stop at the 13-th example: so many errors in software are caused by misprints. There are much more errors of this kind than programmers think. We could go on and on in this section but we decide to stop at the 16-th example at last.