All software has bugs, but there is something ironic about a crashing bug in the /analyze portion of Microsoft’s VC++ 2012 compiler.

The investigation of this bug also shows yet another reason you should test your software using Microsoft’s Application Verifier: you don’t want your customers using Application Verifier to effortlessly find bugs that you missed.

This is part of a series of articles investigating the performance or stability of the software that I use. If you enjoy this article then check out the investigative reporting category.

/analyze for bug finding

Valve has been running the /analyze feature of VC++ on many projects for over a year. Programmers tend to make every coding mistake that is syntactically possible, and static code analysis helps us find some of these inevitable bugs quickly and easily. If you’re not using some form of static code analysis then you should correct that soon.

We have been using the VC++ 2010 version of /analyze. It finds lots of bugs, but it also reports some bizarre and confusing false positives which I reported on last year. Microsoft took these reports seriously and most of my complaints have been addressed in VC++ 2012 so I decided it was time to upgrade our /analyze build machines to the Release To Manufacturing (RTM) version of VC++ 2012.

I’d previously done tests with the Release Candidate (RC) version of VC++ 2012 and had seen good results, so I wasn’t expecting any major difficulties. I was surprised when I started hitting internal compiler errors (ICE, error C1001) on the RTM version. An ICE is generally a crash in the compiler which is caught and turned into an error message and the really frustrating thing about these ICEs was that they were random. Each build would fail on just a few source files, and the set of source files was different every time. Without a repro it was going to be hard to report this bug and get it fixed.

I enabled /errorReport:send on our builder to ensure that crash reports were sent to Microsoft, but I wasn’t sure that would be enough. So I turned to App Verifier. I enabled it for cl.exe and started compiling. I immediately started getting crashes because of memory leaks at shutdown but those aren’t interesting problems so I disabled leak checking in App Verifier. I then started getting frequent access violations.

Bingo!

For full details of how to use App Verifier you should look at the post where I discuss it, but here’s the basics. App Verifier puts each allocation on a separate page, aligned near the end of the page. If running under App Verifier causes access violations near the beginning of a page then you have a buffer overrun. If it causes access violations near the end of a page then you have a use-after-free. The crashes I was seeing were on attempts to access addresses like 0x5F430FE4 and 0x43210FF8, which is the classic signature for a use-after-free bug.

At this point I had a call stack for the bad memory access. It showed (thanks to Microsoft sharing symbols for the compiler) that the use-after-free happened in code that is parsing spec strings, the annotations that help code analysis work. That explains why the crash only happened when building with /analyze. The (slightly simplified) call stack is here:

> mspft110.dll!CSALAnnotation::raw_SetSourceInfo()

mspft110.dll!ISALAnnotation::SetSourceInfo()

mspft110.dll!…FuncSpecStringParserWorker()

mspft110.dll!…FuncSpecStringParserWorker()

mspft110.dll!…FuncSpecStringParserWorker()

mspft110.dll!…ParseSpecstring()

mspft110.dll!UpdateDeclspecModel()

mspft110.dll!UpdateModels()

mspft110.dll!CSALModelManagerEx::internal_UpdateSALForest()

With the call stack, and crash dumps, and a way to reproduce the crashes about a hundred times more frequently I finally had something that I could pass along to Microsoft. They’re looking into it.

Is it worth it?

The /analyze option in VC++ 2012 finds more bugs, has more and better annotations, it is available in the professional version, and it lets you run /analyze on 64-bit code.It also has far fewer false positives. Overall VS 2012 is a solid /analyze upgrade, despite this one unfortunate (and, I have to say, avoidable) misstep.

For now I’ve got my /analyze build machines configured to filter out the reports of the internal compiler errors so that the build process thinks that it succeeded. This seems to work well enough, but it is only an option because I’m not using the compiled code – I just need the compiler’s warning output.

What can we conclude?

Thank you Microsoft for supplying the tools to find bugs in the tools that you supply to find bugs in our code.

The investigation was hindered because the VC++ compiler catches access violations and doesn’t create a crash dump that developers can easily examine. However it was helped because App Verifier, in addition to putting allocations on separate pages, also translates access violations to breakpoint instructions. This is normally an inconvenience but in this case it helped the just-in-time debugger see the crashes. However it also (see below) turned a non-crash into a crash. It feels a bit like everybody involved is trying to be a bit too clever. VC++ should crash normally after reporting the error, and App Verifier should let applications crash normally on access violations, and then my job would be much easier.

Epilogue

Update: a programmer who worked on the VC++ compiler told me that the behavior describe below is actually by design. Address space is reserved and then committed on demand, so what appears like a buffer overrun is just an auto-expanding buffer – that fails when App Verifier is running. I’m leaving the description in for now.

After doing these tests I accidentally left App Verifier enabled for cl.exe. I noticed this when I compiled some code in VS 2010 and I suddenly had dozens of crashed compilers. This time the address for the access violations ended in 0x000 – the classic signature for a buffer overrun.

The debugger state after the VS 2010 crash is shown below – note the text data that walks right to the edge of a page:

The code that tried writing past the end of a page:

The registers with that suspicious page-aligned address in ECX:

And here, for completeness, is the call stack:

> c1xx.dll!tl_getid() + 0x73 bytes

c1xx.dll!can_expand() + 0x205 bytes

c1xx.dll!GetTokenFromCurrentChar() + 0x6fb bytes

c1xx.dll!FileTokenStream::getTokenWorker_impl() + 0xf bytes

c1xx.dll!TokenStream::getToken() + 0xd bytes

c1xx.dll!TokenStreamStack::getToken() + 0x1e bytes

c1xx.dll!yylex() + 0x1a bytes

c1xx.dll!PrimaryParser::Parse() + 0x19d bytes

c1xx.dll!CallPrimaryParser() + 0xb4 bytes

c1xx.dll!main_compile() + 0x15f bytes

c1xx.dll!Trap_main_compile() + 0x15 bytes

As I mentioned at the top of this section, this VC++ 2010 ‘buffer overflow’ is apparently not a bug and would never cause crashes during normal usage. The VC++ 2012 bug is presumably real since it appears to be correlated to the crashes I was seeing. This shows the risks when debugging somebody else’s code.