Moving a big software project to a new compiler can be a lot of work, and few projects are bigger than Chromium. In addition to the main Chromium repository, which includes all of Blink, there are over a hundred other open-source projects which Chromium incorporates by reference, totaling more than 48,000 C/C++ files and 40,000 header files. Porting Chromium on Windows to VC++ 2015 requires getting all of these projects to build and run.

As of March 11th 2016 Chromium for Windows has switched from VC++ 2013 to VC++ 2015, and it doesn’t look like it’s switching back. This will give us more C++ 11 features, new security options, much faster LTCG/PGO builds, and other advantages.

The tracking bug for this project currently has over 330 comments on it, with contributions from dozens of developers. Writing about all of those changes would require an entire book. So I’m going to focus on my favorite part of this project – compiler bugs. In particular, bugs where the compiler silently generates incorrect code.

Update: another code-gen bug in VS 2015 was found on March 31st, and worked around about two weeks later. A VC++ bug was filed. And another silent-bad-code-gen bug was found in September 2016, and worked around a few days later.

If you throw a diverse enough set of code at a compiler, build it in lots of different ways, run lots of tests, and ship the code to customers, you’re going to find some bugs.

Because Chromium is open source I will include links to Chromium bugs and fixes where possible, in addition to links to the VC++ bug reports.

I didn’t find all of these bugs, but it was my job to investigate them, come up with a minimal repro, and report them to Microsoft. And I have to say that the Microsoft team was amazing. They were very supportive, and helpful, and it was clear that they really wanted VC++ 2015 to be as good as possible, and that includes building Chromium. We had an excellent symbiotic relationship: I fed them high quality bug reports, and they fixed them incredibly quickly.

Anyone can file VC++ bugs at the connect website. Getting your bugs noticed is easier if you work for Google, but anyone who files quality bugs can get help. The VC++ MVP program lets ordinary developers who participate in the VC++ community get privileged access to the VC++ team. Recommended – I enjoyed being a VC++ MVP before joining Google.

Chromium is open source so Microsoft can theoretically reproduce any of these bugs if I just tell them what source file to compile and where to look. But that can be a slow process. If the repro steps for your bug includes “Download 90,000 source files and our custom build system and compile for hours” then you can’t expect priority service.

This is especially true because most reports of bugs in compilers are erroneous – most ‘compiler bugs’ are actually bugs in the code being compiled, such as invoking undefined behavior. So, I always had to prove to myself that these were compiler bugs before reporting them, which usually meant simplifying the repro case to make it trivially obvious that the compiler was at fault. I’ve discussed techniques for doing this before.

Failed Chromium

After getting all of Chromium building the next step was to start running Chromium. It would run fine once, but on subsequent runs I’d hit an assert. I suspected a code-gen bug but it was not the case. The problem ultimately turned out to be this code:

HandleWrapper h(CreateMutex(…));

if (h.HandleIsInvalid() && GetLastError() == SOME_ERROR_CODE) …

It turns out that the HandleWrapper class allocates memory, and the debug version of the VC++ 2015 CRT allocations functions call the FlsGetValue function, and FlsGetValue calls SetLastError(0) because it is so incredibly proud that it succeeded. SetLastError(0) destroys the error code from CreateMutex, and madness ensues. I filed a bug and tweeted about this and it turned into a fun little twitter discussion with opinions about evenly split between “if memory allocation affects LastError then Armageddon is imminent” and “if you assume memory allocation won’t affect LastError then you are Satan”.

This way of using HandleWrapper is very convenient and is pervasive in Chromium so fixing all of the uses seemed impractical so instead the HandleWrapper constructor now preserves and then restores LastError. Some other misuses of GetLastError were also fixed.

This was not a compiler code-gen bug, but I wanted to include it because the twitter storm was wonderful.

Microsoft decided to fix the bug which I assumes means that they are choosing Satan over Armageddon.

Failed test

The next step was to start running tests. There are a lot of Chromium test executables and after a bit of experimenting I hit a failure – media_unittests failed on the AudioVideoMetadataExtractorTest.AudioWAV test, but only when built with VS 2015. This is suspicious but proves nothing by itself – it could well be a Chromium bug which the compiler exposed. However I eventually debugged the problem enough to find the failing function (ff_read_riff_info in riffdec.c), and then the failing line of code. It was this:

char key[5] = { 0 };

Simple enough – this is supposed to zero the entire array, but instead it only zeroed the first four bytes.

Clearly the compiler doesn’t make this mistake every time it is asked to zero a five-byte array, or else nothing would run. There was something about the surrounding code that triggered this odd behavior and I still don’t know exactly what it was. I enjoyed coming up with a minimized repro for this bug – just 38 lines including test code to drive it. There’s something satisfying about a small repro and it makes it much more likely that the compiler team will fix the bug – which they did.

In order to get past this bug I checked in a workaround which explicitly zeroes the last character of the array.

PGO only crash

Profile Guided Optimization uses profiling information in order to generate better code. It can give great performance improvements but it’s scary because minor changes in your training – the stage when profiling data is gathered – can affect the generated code. This means that your builds are not entirely deterministic. How exciting. When Google shipped a Chrome Dev Channel build that used PGO we hit a suspicious crash that wasn’t happening in normal builds. The crash was an access violation, which usually means a reference to an unmapped or inaccessible address, but the crash dump showed that there was memory there. The crashing instruction was:

movaps xmm0,xmmword ptr [r14+2398h] ; Don't do this at home

The movaps instruction requires a 16-byte aligned address and this is reading from 0x2398 + r14. So, unless r14 was guaranteed to be 8-bytes greater than a multiple of 16 this would fail. In fact, r14 was guaranteed to be 16-byte aligned, so this instruction would never work.

The code that triggered this bug was actually scalar math on four adjacent integers. The compiler’s vectorizer recognized that it could do the operations using SIMD instructions – it almost did an amazing job…

The problem with investigating this bug was that while a normal Chromium build ‘only’ takes about half an hour on a 24-core/48-thread machine, a PGO build (actually two link-time-code-generation builds with a training session in-between) takes hours. Not good for reporting a bug.

One of the main features of PGO is that it can decide whether to optimize-for-size or optimize-for-speed at very fine granularity. The source file that contained the crashing function was usually optimized-for-speed so I theorized that PGO was causing it to be optimized-for-size instead. I changed the optimization settings for this file, recompiled it, and examined the generated assembly language – bingo!

As discussed in my compiler-bug-reporting post, /FAcs makes investigating compiler bugs much faster.

After figuring out how to trigger this bug I preprocessed the file, reduced it a bit, and filed a bug with instructions to look for the movaps instruction with a bad offset – see the bug for details.

To get a preprocessed file I ran this command to compile just this file with verbose output and retaining the .rsp file (response file, containing arguments to the compiler): ninja -v -d keeprsp -C out\Release ..\..\third_party\libvpx\source\libvpx\vp8\encoder\mcomp.c^^ Then I edited the .rsp file to add /P, re-ran the command that had been verbosely printed, and had my repro.

After reporting this bug I then ran all of our tests on a version of Chromium that was optimized-for-size everywhere, to see if any other bugs were found. This reproduced the original crash, but nothing else.

The minimized repro for this bug was still pretty big and for some reason I could not attach it to the bug – I emailed it to Microsoft and later uploaded it to github. The repro code is also not runnable, but it’s pretty easy to examine the generated code to see whether the bug is there – grepping for movaps in the generated assembly language and looking at the offset is sufficient. See the bug for details.

The generic way of reporting a link-time problem, including LTCG or PGO code-gen bug is to create a LINK_REPRO. This is a good technique to know about but on Chromium it would probably be a many GB repro.

This bug was particularly annoying because it was difficult to come up with a way to work around it. The common strategy of “change the optimization settings” doesn’t work when PGO is overriding those based on profile data. Luckily the fix will arrive soon enough that no workaround is needed.

Bad 64-bit structure read

After talking to the vp8 team about the PGO only crash bug they mentioned another problem that they had hit. Some versions of one of their structures would cause incorrect code to be generated for a read of a 64-bit element from an array. The bug wasn’t currently happening but they had the git hash from a couple of times when it had. So, I synced their repo to that point, preprocessed the file, reproduced the bug, and then started reducing the file. I knew what the bad code-gen looked like so I could minimize the repro quickly. On 32-bit optimized-for-size C compiles (not C++) the compiler would read the first half of the 64-bit value from the correct offset but the second half would always be read from an offset of 4. Oops.

I filed the bug with a minimized repro that was just 22 lines, including test code to print the bad results. This bug was first noticed in VS 2013 but hadn’t previously been reported. It will be fixed in VS 2015 Update 2. No workaround was needed because the bug was not currently manifesting.

Pre-RTM

This bug actually came first chronologically, but since it was a bug in VS 2015 prior to its initial release I’m rating it as less interesting. The repro for the bug – all 44 lines of it, including test code to drive it – causes the call to a destructor to be omitted. I apologize for the confusing description – there’s no way to go back and fix typos or misunderstandings.

Other bugs

Probably my smallest compiler bug repro was an internal compiler error that I could trigger with a one line source file. I worked around that bug by not using /analyze on ffmpeg for a while.

Compared to a one-line repro my bloated 67 line source file to trigger an internal compiler error just seems weak.

One of the reasons to upgrade to VC++ 2015 is because LTCG/PGO links are up to five times faster than with VC++ 2013. Why? I know one reason.

We also hit several bugs where VC++ 2015 refused to compile or link our code:

There are other bugs but this is enough for today.

Once that last linker crash is fixed I can only assume that VC++ 2015 will be completely bug free. If it can build Chromium’s many configurations successfully then it must be!

It could be worse

I worked on one project where in addition to finding several compiler bugs I also found a flaw in the design of the processor, but that’s an NDA tale for another day…

Discussion of this article

There is a good discussion of this article on reddit, a discussion on hacker news, and a slashdot discussion.