You might have read that Mozilla recently switched Windows builds to clang-cl. More recently, those Windows builds have seen both PGO (Profile-Guided Optimization) and LTO (Link-Time Optimization) enabled.

As of next nightly (as of writing, obviously), all tier-1 platforms are now built with clang with LTO enabled. Yes, this means Linux, Mac and Android arm, aarch64 and x86. Linux builds also have PGO enabled.

Mac and Android builds were already using clang, so the only difference is LTO being enabled, which brought some performance improvements.

The most impressive difference, though, was on Linux, where we’re getting more than 5% performance improvements on most Talos tests (up to 18% (!) on some tests) compared to GCC 6.4 with PGO. I must say I wasn’t expecting switching from GCC to clang would make such a difference. And that is with clang 6. A quick test with upcoming clang 7 suggests we’d additionally get between 2 and 5% performance improvement from an upgrade, but our static analysis plugin doesn’t like it.

This doesn’t mean GCC is being unsupported. As a matter of fact, we still have automated jobs using GCC for some static analysis, and we also have jobs ensuring everything still builds with a baseline of GCC 6.x.

You might wonder if we tried LTO with GCC, or tried upgrading to GCC 8.x. As a matter of fact, I did. Enabling LTO turned up linker errors, and upgrading to GCC 7.x turned up breaking binary compatibility with older systems, and if I remember correctly had some problems with our test suite. GCC 8.1 was barely out when I was looking into this, and we all know to stay away from any new major GCC version until one or two minor updates. Considering the expected future advantages from using clang (cross-language inlining with Rust, consistency between platforms), it seemed a better deal to switch to clang than to try to address those issues.

Update: As there’s been some interest on reddit and HN, and I failed to mention it originally, it’s worth noting that comparing GCC+PGO vs. clang+LTO or GCC+PGO vs. clang+PGO was a win for clang overall in both cases, although GCC was winning on a few benchmarks. If I remember correctly, clang without PGO/LTO was also winning against GCC without PGO.

Anyways, what led me on this quest was a casual conversation at our last All Hands, where we were discussing possibly turning on LTO on Mac, and how that should roughly just be about turning a switch.

Famous last words.

At least, that’s a somehow reasonable assumption. But when you have a codebase the size of Firefox, you’re up for “interesting” discoveries.

This involved compiler bugs, linker bugs (with a special mention for a bug in ld64 that Apple has apparently fixed in Xcode 9 but hasn’t released the source of), build system problems, elfhack issues, crash report problems, clang plugin problems (would you have guessed that __attribute__((annotate("foo"))) can affect the generated machine code?), sccache issues, inline assembly bugs (getting inputs, outputs and clobbers correctly is hard), binutils bugs, and more.

I won’t bother you with all the details, but here we are, 3 months later with it all, finally, mostly done. Counting only the bugs assigned to me, there are 77 bugs on bugzilla (so, leaving out anything in other bug trackers, like LLVM’s). Some of them relied on work from other people (most notably, Nathan Froyd’s work to switch to clang and then non-NDK clang on Android). This spread over about 150 commits on mozilla-central, 20 of which were backouts. Not everything went according to plan, obviously, although some of those backouts were on purpose as a taskcluster trick.

Hopefully, this sticks, and Firefox 64 will ship built with clang with LTO on all tier-1 platforms as well as PGO on some. Downstreams are encouraged to do the same if they can. The build system will soon choose clang by default on all builds, but won’t enable PGO/LTO.

As a bonus, as of a few days ago, Linux builds are also finally using Position Independent Executables, which improves Address Space Layout Randomization for the few things that are in the executables instead of some library (most notably, mozglue and the allocator). This was actually necessary for LTO, because clang doesn’t build position independent code in executables that are not PIE (but GCC does), and that causes other problems.

Work is not entirely over, though, as more inline assembly bugs might be remaining only not causing visible problems by sheer luck, so I’m now working on a systematic analysis of inline assembly blocks with our clang plugin.

p.m.o

Responses are currently closed, but you can trackback from your own site.