Introduction

Hello folks! I’m here again this time to talk about static analysis. If you are a developer with little to no knowledge on the subject this is the right article for you. Static analysis is the process of analyzing the code of a program without actually running it as opposed to dynamic analysis where code is analysed at run time. This process helps developers to identify potential design issues, bugs, to improve performances and to ensure conformance to coding guidelines.Running static analysis regularly is a good practice and if it’s done regularly it helps identifying issues at an early phase of the software development life cycle. It is mainly used on strongly-typed, static languages like C/C++, C# or Java and there are plenty of products out there, either free and commercial. Have a look at this wikipedia article.

The aim of this article is to test the effectiveness of static analysis on a very famous game physics engine. Bullet Physics and eventually create a pull request if any fixes arise. Bullet Physics is an open source SDK for real-time collision detection and it has been used in 3D authoring tools like Blender, games like Grand Theft Auto VI, Grand Theft Auto V and Rocket League and also movies. You can find out more in this Wikipedia article. For this task I used a commercial tool called PVS-Studio from viva64 for which I received a free temporary license. PVS-Studio is a static analyzer for C++ and C#. A demo version can be downloaded for free. The demo has limitations on the available “clicks” the user can perform on the warnings it generates.

Static Analysis vs Dynamic Analysis

Static and dynamic analysis are two faces of the same coin, while the former is performed on the code at ‘compile time’, the latter is instead done at run-time usually using a given input data. An example of dynamic analysis is code coverage, which measures to what degree code is covered by tests. Running unit or integration tests are also examples of dynamic analysis. Dynamic analysis is a very widespread practice. Test-driven development for example, has dynamic analysis at his core. Nowadays it is odd to think about an open source or a commercial project where code is not covered by tests, isn’t it? We all know the benefits of having good test coverage. However static analysis takes a different approach but unfortunately it is not as widespread as dynamic analysis.

Static analysis can be very effective because it can unearth bugs and errors earlier in the software development life cycle which would not emerge in a dynamic test. On the other side, static analysis is not a panacea to identify all the issues and it has its cons. One of them is the accuracy that may lead to false positives. A false positive is a ‘mistake’ in reporting that a rule was violated. Moreover the process of going through hundred of warnings, especially in large projects of big teams where changes happen on a daily basis can be frustrating and time-consuming for developers. However the effectiveness of static analysis grows proportionally with the project size because the error density in small projects is lower than in larger ones. You can find a nice explanation here.

A Test Bench: Bullet Physics



I love game physics, in the past I used Bullet Physics in a lot of my projects (especially college projects 😛 ) and I’m really curious to see and to know what I can find out with static analysis. At the time of writing this article I’m using Bullet 2.86 commit hash 5b789ed.

At first I used the tool available in Visual Studio. If you want to learn the basics of code analysis using the Microsoft tools, you can find a lot of material on Microsoft’s documentation on code analysis page.

My first run of the Microsoft’s static analyzer generated more than 800 warnings so I decided to create a smaller solution using only the main core libraries. In CMAKE when you generate the solution you need to un-check the following options under the BUILD category:

BUILD_BULLET2_DEMOS

BUILD_CPU_DEMOS

BUILD_OPENGL3_DEMOS

BUILD_UNIT_TESTS

This time the generated warnings were 118. Let’s have a look at a typical output, so that you have an idea of what kind of hint a static analyzer gives you. Let’s pick the btConvexHull.cpp file in the LinearMath project.

Visual Studio Analyzer Output

Code Description File Line C6011 Dereferencing NULL pointer ‘t’. btconvexhull.cpp 476

The warning above is composed of a code, which can be clicked and brings you to a page with a thorough explanation and an example of the warning itself, a description of the issue and whereabouts it happens. In the image below there is the code highlighted on the left and a side window on the right containing a full explanation of the issue. In the extrudable function the pointer btHullTriangle t* is set initially to NULL, in the case of the loop being skipped because of the array m_tris is empty then the pointer t* will never be set, and thus at line 476 the pointer t* is dereferenced while still being null.

Extrudable function – btConvexHull.cpp

As you see the job of a static analyzer is to predict the execution flow and evaluate branches. Then it provides an explanation of warning and what specifically causes it. You may get the grasp now of how static analysis is different from dynamic analysis. In this case a bug can only be spotted if and only if the collection m_tris is empty, which is a specific case that depends on data. However that might never happen either. For example the class btConvexHull calculates the convex hull of a given input polygon and m_tris represents the triangles of a generated simplex. The convex hull of a set of points (i.e. vertices) of a polygon in a Euclidean space is the smallest convex set that contains the polygon. Thus it’s highly improbable that m_tris will be empty when the function extrudable is called unless an empty set of vertices is passed. In general in the case where all call sites of the function (or of the class) guarantee that it is only passed valid data then the warning can be ignored. If it can’t be guaranteed that then the warning should not be ignored. Let’s see what the PVS-Studio produces on this file:

PVS-Studio Output

I have to admit that the UI is much nicer and efficient as it groups the warnings by priority. There are two more sections for optimizations and 64-bit errors which lack in VS. The warning template is similar to VS, each warning is provided with a code and a link that redirects to page with a detailed documentation on the error, a message and the line number. The careful reader might have already noted that there is no similar warning to the one discussed earlier (btConvexHull.cpp line 476) present in the list above. This is interesting but I’ll get back to it later. The high and medium priority warnings tell us that we should initialize some members in the btConvexHull class (error V730). Those members are all primitive types and in fact uninitialized members in C++ can lead to errors or undefined behaviors at runtime. It is to be said though that these warnings refer to parameterless constructors which are not used anywhere. An immediate solution would be removing these empty constructors. However it is to be kept in mind that this is a library and it’s hard to predict how it will be used, so it’s almost mandatory to avoid any possible misuse that can generate bugs.

Low priority warnings suggest to avoid precise comparing of floating point numbers directly (error V550) and say instead to compare the absolute value of the difference with an epsilon value (e.g. 0.00001). Floating point math is not exact. Simple values like 1.6 or 0.2 cannot be precisely represented using binary floating point numbers. Comparing two floats to see if they are equal is usually not a good idea. Compilers have warnings for this also (like GCC for example): “warning: comparing floating point with == or != is unsafe“.

Running PVS-Studio on the whole solution of the core libraries returned a longer list of warnings, look at the picture below:

It’s not feasible to go through the whole list for just one article (for the both of us!), so I cherry-picked a few warnings both from the General and the Optimization section.

Code Description File Line V512 A call of the ‘memcpy’ function will lead to the ‘m_headerString’ buffer becoming out of range. b3file.cpp 142 V773 The ‘srcFileHandle’ pointer was assigned values twice without releasing the memory. A memory leak is possible b3openclutils.cpp 691 V773 The function was exited without releasing the ‘bvh’ pointer. A memory leak is possible. b3gpunarrowphase.cpp 568 V802 On 32-bit platform, structure size can be reduced from 32 to 24 bytes by rearranging the fields according to their sizes in decreasing order. btsoftbody.h 187

The first warning from the top V512 is a warning code related to memory buffer filling. The class bFile is part of the Bullet serialization system which is used to save/load bullet objects. Look at the snippet below:

[code language=”cpp”]

#define SIZEOFBLENDERHEADER 12

char m_headerString[7];

void bFile::parseHeader() {

….

char header[B3_SIZEOFBLENDERHEADER+1];

if (strncmp(header, m_headerString, 6)!=0) {

memcpy(header, m_headerString, B3_SIZEOFBLENDERHEADER); // buffer overflow!!!

return;

}

…

}

[/code]

The define B3_SIZEOFBLENDERHEADER is set to 12 but if you look carefully at the m_headerString char array definition you’ll notice that it’s only 7 bytes long. The function memcpy at line 7 is copying/reading 12 bytes out of a 7 bytes array. Needless to say that this is a bug and the consequences of this type of errors are either buffer overflows and dirty data reads. It would have been worse if m_headerString was part of a struct where the allocated memory for it is contiguous. For instance:

[code language=”cpp”]

#define B3_SIZEOFBLENDERHEADER 12

struct MyStruct {

char m_headerString[7];

char m_header2[5];

}

…

char header[12];

MyStruct myStruct;

memset(myStruct.m_headerString, 0, 7);

memset(myStruct.m_header2, 1, 5);

memcpy(header, myStruct.m_headerString, B3_SIZEOFBLENDERHEADER);

[/code]

What do you think header will contain after memcpy is executed? For the first 7 bytes it will contain 0s, but for the other 5 bytes will contain 1s!

Let’s move on to the second warning from the list. Code V773 is a memory leak warning. Personally I think memory leak bugs are the worst kind, especially for game development. This issue was spotted in the Bullet OpenCL module. OpenCL (Open Computing Language) is a framework based on the C99 language and similar to CUDA that provides a standard interface for parallel computing. This module is still experimental in Bullet and it’s used to run the physics simulation off the GPU.

In the b3openclutils.cpp file, pointer srcFileHandle is assigned twice. PVS-Studio is telling us that assigning twice the same pointer without releasing the memory could lead to a memory leak.

Looking carefully at the code above, it can be noticed that in total srcFileHandle is not assigned twice, but six times! The first time it’s assigned at line 682, if the handle is invalid then the pointer is assigned again within the loop (lines 687-692) for max 5 attempts (i < 5). Before starting panicking and coming to any conclusion let’s analyze the CreateFileA function and what it does. CreateFileA is a typedef for CreateFile, which is a Win32 function to create or open a file or I/O device. When the function is invoked and it succeeds, the return value is an open handle to the specified file. However if the function fails, the return value is INVALID_HANDLE_VALUE. Obviously the handle need not to be closed, i.e. released to free memory. Thus there is no memory leak as long as the function fails. This is in fact a false positive. However if the function were different, and it returned a new pointer all the time it was invoked then the warning would be totally correct.

The second memory leak warning was generated in file b3gpunarrowphase.cpp. This time the issue is slightly different, there is a pointer that is allocated within the function and not released before the function exit. After looking carefully at the registerCompoundShape function I noticed that the pointer is not released at all anywhere within the function or the class. This is a real memory leak bug and PVS-Studio is correct.

The last warning V802 belongs to the optimization section. I found this type of suggestion extremely interesting and helpful, especially for structs whose are used in big arrays. The V802 warnings suggests that the size of a structure can be reduced if the fields are arranged according to their sizes in decreasing order. The struct in question is this one:

/* sCti is Softbody contact info */ struct sCti { const btCollisionObject* m_colObj; /* 4 bytes -- 8 bytes reserved */ btVector3 m_normal; /* 16 bytes */ btScalar m_offset; /* 4 bytes -- 8 bytes reserved */ }; // Total struct size = 32 bytes in 32 bits code

This struct represents the soft body contact info and is used to store the collision response force. Assuming there are at a given time 10^6 contact points, there would be in memory 32 Megabytes of data only for this struct. Saving 8 bytes for each contact by rearranging the fields as follows:

/* sCti is Softbody contact info */ struct sCti { btVector3 m_normal; /* 16 bytes */ const btCollisionObject* m_colObj; /* 4 bytes */ btScalar m_offset; /* 4 bytes */ }; // Total struct size = 24 bytes in 32 bits code

That rearrangement can save up to 8 · 10^6 = 8.000.000 bytes (8 Megabytes). Clearly there might not be a million contacts points within a simulation but I hope you understand the importance of reducing structure size if you intend to use hundred of thousand or even million of instances of a struct.

Conclusions

The aim of this article was to explore the effectiveness of static analysis on a very well established open source library, Bullet Physics. Despite I took only a few warnings into account a buffer overflow and a memory leak bug came out. Unfortunately it is not possible to eliminate false positives from static analysis, in fact it took me some time to identify the veracity of the warnings. However I am very happy with the outcome and there is a lot more to be done on Bullet Physics, so I cannot wait to have enough material to create a pull request.

Static analysis is certainly a time-consuming task, in particular if done rarely, but it is definitely worth the time. If you stop for a moment to think how much effort it takes to design and to implement correctly unit and integration tests you’ll realise that it is time-consuming too. This is why a process like TDD is usually adopted from the very early stage. For the same reason static analysis should be part of the software development life cycle from the early stages and also integrated into the build environment.

Acknowledgments

A special thanks goes to Andrey Karpov, CTO at “Program Verification Systems” Co Ltd, for supporting this initiative and the help with PVS-Studio.