Virtual machines are important tools in the arsenal of a software developer. Being an active user of VirtualBox, and checking various open source projects with the help of it, I was personally interested in checking its source code. We did the first check of this project in 2014, and the description of 50 errors barely fit into two articles. With the release of Windows 10 and VirtualBox 5.0.XX the stability of the program got significantly worse, in my humble opinion. So, I decided to check the project again.

Introduction

VirtualBox (Oracle VM VirtualBox) is a general-purpose, full virtualizer for x86 hardware, targeted at server, desktop, and embedded use. It is supported by the following operating systems: Microsoft Windows, FreeBSD, Solaris/OpenSolaris, Linux, Mac OS X, DOS, ReactOS, and others.

You may find the previous articles about VirtualBox here:

These articles contain more than 50 dangerous fragments which were found using PVS-Studio 5.18. I haven't seen such warnings in the new analyzer report. It means that the developers had a look at the articles, and fixed all the places errors were spotted by the analyzer. Those who are willing may find these places in the latest version of the source code, and can have a look at the way the fixes of PVS-Studio warnings appear in a real project. But in another check, I've encountered a lot of other interesting messages.

I also would like to emphasize that only regular use of static analysis (not necessarily PVS-Studio) can maintain the high quality of the code. Our all experience of fixing analyzer warnings in the Unreal Engine code, showed that the quantity of errors constantly increases in a developing project, so after one-time checks the quality of the code will gradually get to the initial state, and new bugs will still be getting into the code. In the VirtualBox project we see a similar situation. The growth of the analyzer warnings after a one-time check looks something like this:

It is important to emphasize that by "regular" use of the analyzer, we mean daily checks. Many errors that are detected during the testing stage can be eliminated at the stage of writing the code.

Another advantage of the regular use of static analyzers, is the regular updates. Since the first time we checked VirtualBox, we have added more than 50 new diagnostic rules. The last section will be devoted specifically to the errors that were found using the new diagnostics.

Oracle VM VirtualBox source code was tested with the help of PVS-Studio version 6.02.

Perhaps someone will need the number of the verified revision.

Checked out external at revision 2796. Checked out revision 59777.

Stubborn errors

Before writing this article, I had a look at the bugs that were previously found by the analyzer, and found similar errors in the new code. I suppose the same person could be writing this code.

V521 Such expressions using the ',' operator are dangerous. Make sure the expression is correct. vboxmpwddm.cpp 1083

NTSTATUS DxgkDdiStartDevice(...) { .... if ( ARGUMENT_PRESENT(MiniportDeviceContext) && ARGUMENT_PRESENT(DxgkInterface) && ARGUMENT_PRESENT(DxgkStartInfo) && ARGUMENT_PRESENT(NumberOfVideoPresentSources), // <= ARGUMENT_PRESENT(NumberOfChildren) ) { .... } .... }

Similar code was described in the first article. The comma operator ',' evaluates the left and the right operand. The thing is, the left operand is no longer used, and the result of the operator is the value of the right operand. It is most likely that the programmer wanted to use '&&' operator, as in other strings.

V519 The 'pThis->aCSR[103]' variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 1230, 1231. devpcnet.cpp 1231

static void pcnetSoftReset(PPCNETSTATE pThis) { .... pThis->aCSR[94] = 0x0000; pThis->aCSR[100] = 0x0200; pThis->aCSR[103] = 0x0105; // <= pThis->aCSR[103] = 0x0105; // <= .... }

The code has duplicate strings. The developers fixed a similar fragment mentioned in the first article by deleting an extra string. What we have here - an error in the array index or an extra string - we'll see in the next versions of VirtualBox.

V501 There are identical sub-expressions 'mstrFormat.equalsIgnoreCase("text/plain")' to the left and to the right of the '||' operator. vboxdnddataobject.cpp 38

STDMETHODIMP VBoxDnDDataObject::GetData(....) { .... else if( mstrFormat.equalsIgnoreCase("text/plain") // <= || mstrFormat.equalsIgnoreCase("text/html") || mstrFormat.equalsIgnoreCase("text/plain;charset=utf-8") || mstrFormat.equalsIgnoreCase("text/plain;charset=utf-16") || mstrFormat.equalsIgnoreCase("text/plain") // <= || mstrFormat.equalsIgnoreCase("text/richtext") || mstrFormat.equalsIgnoreCase("UTF8_STRING") || mstrFormat.equalsIgnoreCase("TEXT") || mstrFormat.equalsIgnoreCase("STRING")) { .... }

Copy-paste programming will live forever. There are already two identical "text/plain" checks, but on top of that, the whole code clock was copied to another file:

V501 There are identical sub-expressions '!RTStrICmp(pszFormat, "text/plain")' to the left and to the right of the '||' operator. vboxdnd.cpp 834

define true false; //happy debugging!

No joking - such code in different variations can be found in real projects.

V547 Expression is always false. Unsigned type value is never < 0. dt_subr.c 715

int dt_printf(dtrace_hdl_t *dtp, FILE *fp, const char *format, ...) { .... if (vsnprintf(&dtp->dt_buffered_buf[dtp->dt_buffered_offs], // <= avail, format, ap) < 0) { rval = dt_set_errno(dtp, errno); va_end(ap); return (rval); } .... }

At first glance, there is nothing to pick on here, except the analyzer. In the documentation to the "vsnprintf" function it is quite clearly stated that in case of an error, it returns a negative number. I have even given this code fragment to one of the developers of the kernel of the C++ analyzer as an example of a false positive. But it turned out that the analyzer was right.

Who could think that among thousands of header files there would be, somewhere, a string like this:

#define vsnprintf RTStrPrintfV

In the preprocessed file the source fragment will be deployed as follows:

if (RTStrPrintfV(&dtp->dt_buffered_buf[dtp->dt_buffered_offs], avail, format, ap) < 0) { rval = dt_set_errno(dtp, (*_errno())); ( ap = (va_list)0 ); return (rval); }

The function RTStrPrintfV() returns the value of an unsigned type 'size_t', not the signed 'int' type, so this check will lead to a logical error, as in fact no checking is done.

Function prototypes for comparison:

size_t RTStrPrintfV(char *, size_t, const char *, va_list args); int vsnprintf (char *, size_t, const char *, va_list arg );

Suspicious "From-To" code

V570 The 'from->eval1D[i].u1' variable is assigned to itself. state_evaluators.c 1006

void crStateEvaluatorDiff(CREvaluatorBits *e, CRbitvalue *bitID, CRContext *fromCtx, CRContext *toCtx) { .... from->eval1D[i].order = to->eval1D[i].order; from->eval1D[i].u1 = from->eval1D[i].u1; // <= from->eval1D[i].u2 = from->eval1D[i].u2; // <= ... }

The analyzer detected suspicious assignments of variables to themselves. Most likely on the right side of the assignment operator, the programmer should write an object with the name 'to' rather than 'from'.

Five more fragments in this file:

V570 The 'from->eval1D[i].u2' variable is assigned to itself. state_evaluators.c 1007

V570 The 'from->eval2D[i].u1' variable is assigned to itself. state_evaluators.c 1042

V570 The 'from->eval2D[i].u2' variable is assigned to itself. state_evaluators.c 1043

V570 The 'from->eval2D[i].v1' variable is assigned to itself. state_evaluators.c 1044

V570 The 'from->eval2D[i].v2' variable is assigned to itself. state_evaluators.c 1045

V625 Consider inspecting the 'for' operator. Initial and final values of the iterator are the same. state_transform.c 1365

void crStateTransformDiff(...., CRContext *fromCtx, CRContext *toCtx ) { .... for (i = to->colorStack.depth; i <= to->colorStack.depth; i++) { LOADMATRIX(to->colorStack.stack + i); from->colorStack.stack[i] = to->colorStack.stack[i]; /* Don't want to push on the current matrix */ if (i != to->colorStack.depth) diff_api.PushMatrix(); } .... }

I've decided to give a description of such errors a separate section because of one more suspicious fragment containing the 'to' and 'from' names.

The initial and the final value of the loop counter are the same in this code fragment. As a result there is just one iteration in the loop. Again, it is most likely a typo in the name of the 'to' object.

About the priorities of operations

V564 The '&' operator is applied to bool type value. You've probably forgotten to include parentheses or intended to use the '&&' operator. glsl_shader.c 4102

static void generate_texcoord_assignment(....) { DWORD map; unsigned int i; char reg_mask[6]; if (!ps) return; for (i = 0, map = ps->baseShader.reg_maps.texcoord; map && i < min(8, MAX_REG_TEXCRD); map >>= 1, ++i) { if (!map & 1) // <= continue; .... } }

Because of the missing parentheses in the "!map & 1" condition, we see that the value of the 'map' variable is verified against null. Apparently the programmer intended to check if the lowest bit is set. Another sign of an error, is the fact that the verification of the 'map' against null is already present in the loop termination condition. Thus, this condition is always false, and the 'continue' operator will never be executed.

The condition should most likely be written like this:

if ( !(map & 1) ) continue;

V590 Consider inspecting this expression. The expression is excessive or contains a misprint. vboxdispcm.cpp 288

HRESULT vboxDispCmSessionCmdGet(....) { .... Assert(hr == S_OK || hr == S_FALSE); if (hr == S_OK || hr != S_FALSE) // <= { return hr; } .... }

The analyzer has detected a suspicious condition in which the subexpression "hr == S_OK" does not affect the result of the condition in any way.

We can make sure looking at the truth table of this conditional expression:

By the way, we can see suspicious Assert(), that has a modified conditional expression.

In general, this type of error is very common. For example, the FreeBSD kernel was no exception.

The full list of suspicious fragments from VirtualBox:

V590 Consider inspecting the 'err == 0L || err != 1237L' expression. The expression is excessive or contains a misprint. vboxdisplay.cpp 656

V590 Consider inspecting the 'rc == 3209 || rc != (- 3210)' expression. The expression is excessive or contains a misprint. vd.cpp 10876

V590 Consider inspecting the 'rc == 3209 || rc != (- 3210)' expression. The expression is excessive or contains a misprint. vd.cpp 10947

V590 Consider inspecting the 'rc == 3209 || rc != (- 3210)' expression. The expression is excessive or contains a misprint. vd.cpp 11004

V590 Consider inspecting the 'rc == 3209 || rc != (- 3210)' expression. The expression is excessive or contains a misprint. vd.cpp 11060

Various warnings

V511 The sizeof() operator returns size of the pointer, and not of the array, in 'sizeof (plane)' expression. devvga-svga3d-win.cpp 4650

int vmsvga3dSetClipPlane(...., float plane[4]) // <= { .... /* Store for vm state save/restore. */ pContext->state.aClipPlane[index].fValid = true; memcpy(pContext->state.aClipPlane[....], plane, sizeof(plane)); .... }

The 'plane' variable is just a pointer to the array of 'float' type. The value of "sizeof(plane)" will be 4 or 8, depending on the bitness of the program. The number '[4]' in the parameters of the function, gives a hint to the programmer that an array of a 'float' type containing 4 elements will be passed to the function. Thus, the memcpy() function copies a wrong number of bytes.

V517 The use of 'if (A) {...} else if (A) {...}' pattern was detected. There is a probability of logical error presence. Check lines: 411, 418. mp-r0drv-nt.cpp 411

static int rtMpCallUsingDpcs(....) { .... if (enmCpuid == RT_NT_CPUID_SPECIFIC) // <= { KeInitializeDpc(&paExecCpuDpcs[0], rtmpNtDPCWrapper, pArgs); KeSetImportanceDpc(&paExecCpuDpcs[0], HighImportance); KeSetTargetProcessorDpc(&paExecCpuDpcs[0], (int)idCpu); pArgs->idCpu = idCpu; } else if (enmCpuid == RT_NT_CPUID_SPECIFIC) // <= { KeInitializeDpc(&paExecCpuDpcs[0], rtmpNtDPCWrapper, pArgs); KeSetImportanceDpc(&paExecCpuDpcs[0], HighImportance); KeSetTargetProcessorDpc(&paExecCpuDpcs[0], (int)idCpu); pArgs->idCpu = idCpu; KeInitializeDpc(&paExecCpuDpcs[1], rtmpNtDPCWrapper, pArgs); KeSetImportanceDpc(&paExecCpuDpcs[1], HighImportance); KeSetTargetProcessorDpc(&paExecCpuDpcs[1], (int)idCpu2); pArgs->idCpu2 = idCpu2; } .... }

A part of the code in the second condition never gets control because of two identical expressions in the cascade of conditions.

V531 It is odd that a sizeof() operator is multiplied by sizeof(). tstrtfileaio.cpp 61

void tstFileAioTestReadWriteBasic(...., uint32_t cMaxReqsInFlight) { /* Allocate request array. */ RTFILEAIOREQ *paReqs; paReqs = (...., cMaxReqsInFlight * sizeof(RTFILEAIOREQ)); RTTESTI_CHECK_RETV(paReqs); RT_BZERO(..., sizeof(cMaxReqsInFlight) * sizeof(RTFILEAIOREQ)); /* Allocate array holding pointer to data buffers. */ void **papvBuf = (...., cMaxReqsInFlight * sizeof(void *)); .... }

The analyzer detected a suspicious product of two sizeof() operators. If we have a look at 'RT_BZERO' macro, we may have a question: "Why do we get a size of a variable that has the 'uint32_t' type and multiply it by the size of a different type?" In adjoining sections of code the size of the array is evaluated as "cMaxReqsInFlight * sizeof(RTFILEAIOREQ)". Perhaps it is an error; the same size should be used in the string with 'RT_BZERO'.

V547 Expression 'sd >= 0' is always true. Unsigned type value is always >= 0. vboxservicevminfo.cpp 1086

static int vgsvcVMInfoWriteNetwork(void) { .... SOCKET sd = WSASocket(AF_INET, SOCK_DGRAM, 0, 0, 0, 0); .... if (pAdpInfo) RTMemFree(pAdpInfo); if (sd >= 0) // <= closesocket(sd); .... }

The SOCKET type (in Visual C++) is unsigned, so the check "sd >=0" is meaningless. The reason for such code is clear: the project is built on different operating systems, and in the UNIX systems the socket values are stored in the 'int' variable of signed type. In general the code for working with sockets is written correctly: to check the states, the programmer uses constants from the system header files. But cross-platform code contains a lot of conditional preprocessor directives, so in one place a check wasn't noticed, that is always true for Windows.

V560 A part of conditional expression is always true: 0x1fbe. tstiprtministring.cpp 442

static void test2(RTTEST hTest) { .... for (RTUNICP uc = 1; uc <= 0x10fffd; uc++) { if (uc == 0x131 || uc == 0x130 || uc == 0x17f || 0x1fbe)// <= continue; //^^^^^^ if (RTUniCpIsLower(uc)) { RTTESTI_CHECK_MSG(....), ("%#x

", uc)); strLower.appendCodePoint(uc); } if (RTUniCpIsUpper(uc)) { RTTESTI_CHECK_MSG(....), ("%#x

", uc)); strUpper.appendCodePoint(uc); } } .... }

Usually, we don't write about the warnings issued for the test files in the articles. By the way, it's very easy to exclude messages received for all the files in the specified directory. Still, I've decided to write about one of them here. It's quite peculiar because of the fact that the test doesn't actually test anything, because of a typo. The 'continue' operator is executed during every iteration of the for() loop. The value '0x1fbe' will always be true, because an expression "uc ==" is missing in the condition. This is a good example of how static analysis complements unit testing.

The correct version:

if (uc == 0x131 || uc == 0x130 || uc == 0x17f || uc == 0x1fbe) continue;

V610 Undefined behavior. Check the shift operator '<<'. The left operand '(- 2)' is negative. translate.c 2708

static void gen_push_T1(DisasContext *s) { .... if (s->ss32 && !s->addseg) gen_op_mov_reg_A0(1, R_ESP); else gen_stack_update(s, (-2) << s->dflag); .... }

According to the latest standards of the C++ language, the shift of a negative number results in undefined behavior.

Two more similar fragments:

V610 Undefined behavior. Check the shift operator '<<'. The left operand is negative ('i64' = [-1..0]). tarvfs.cpp 234

V610 Undefined behavior. Check the shift operator '<<'. The left operand '-16' is negative. translate.c 2761

TODOs

V523 The 'then' statement is equivalent to the 'else' statement. state_evaluators.c 479

static void map2(G....) { .... if (g->extensions.NV_vertex_program) { /* XXX FIXME */ i = target - GL_MAP2_COLOR_4; } else { i = target - GL_MAP2_COLOR_4; } .... }

"FIXME" and "TODO" can live in the code for a very long time, but the static analyzer won't let you forget about code that was left unfinished.

V530 The return value of function 'e1kHandleRxPacket' is required to be utilized. deve1000.cpp 3913

static void e1kTransmitFrame(PE1KSTATE pThis, bool fOnWorkerThread) { .... /** @todo do we actually need to check that we're in loopback mode here? */ if (GET_BITS(RCTL, LBM) == RCTL_LBM_TCVR) { E1KRXDST status; RT_ZERO(status); status.fPIF = true; e1kHandleRxPacket(pThis, pSg->aSegs[0].pvSeg, ....); // <= rc = VINF_SUCCESS; // <= } e1kXmitFreeBuf(pThis); .... }

In other parts of the source code, the result of the function e1kHandleRxPacket () is usually saved to the 'rc' variable. But until the code is completed, the result of the function is not used and "VINF_SUCCESS" is always saved in the status.

New diagnostics

In this section I will describe the analyzer warnings that appeared in PVS-Studio, after the last check of the VirtualBox project.

V745 A 'wchar_t *' type string is incorrectly converted to 'BSTR' type string. Consider using 'SysAllocString' function. vboxcredentialprovider.cpp 231

static HRESULT VBoxCredentialProviderRegisterSENS(void) { .... hr = pIEventSubscription->put_EventClassID( L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}"); .... }

The analyzer saw that the string of "wchar_t *" type is handled as a string of BSTR type.

BSTR (basic string or binary string), is a string data type that is used in COM, Automation, and Interop functions. A string of this type is composed of a 4 byte length prefix, a data string, and a delimiter of two null characters. The length prefix is specified before the first character of the string, and does not take into account the delimiter character. In this case the length prefix will be missing before the beginning of the string.

Corrected version using the SysAllocString() function:

static HRESULT VBoxCredentialProviderRegisterSENS(void) { .... hr = pIEventSubscription->put_EventClassID(SysAllocString( L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}")); .... }

More suspicious fragments:

V745 A 'wchar_t *' type string is incorrectly converted to 'BSTR' type string. Consider using 'SysAllocString' function. vboxcredentialprovider.cpp 277

V745 A 'wchar_t *' type string is incorrectly converted to 'BSTR' type string. Consider using 'SysAllocString' function. vboxcredentialprovider.cpp 344

V745 A 'wchar_t *' type string is incorrectly converted to 'BSTR' type string. Consider using 'SysAllocString' function. string.cpp 31

V746 Type slicing. An exception should be caught by reference rather than by value. extpackutil.cpp 257

RTCString *VBoxExtPackLoadDesc(....) { .... xml::XmlFileParser Parser; try { Parser.read(szFilePath, Doc); } catch (xml::XmlError Err) // <= { return new RTCString(Err.what()); } .... }

The analyzer detected a potential error, related to catching the exception by value. It means that a new 'Err' object of xml::XmlError type, will be constructed with the help of a copy constructor. At the same time some of the code will lose some data about the exception that was stored in the classes, inherited from xml::XmlError.

Another suspicious fragment:

V746 Type slicing. An exception should be caught by reference rather than by value. extpackutil.cpp 330

Conclusion:

VirtualBox project is a good example of just how important it is to do static analysis regularly on a developing project. It prevents the growth of potential bugs during the development stage, and allows fresh updates of the analysis tool.

I would also gladly check MS Word, which froze several times for 7-10 minutes fully loading the processor, when I was writing the article. But there is no such possibility. We have done some archaeological research on MS Word 1.1a, but that is a another story.

So, feel free to download PVS-Studio without filling in any forms, and find bugs in your project. Think about the users, and the time the programmers could save.