One of the goals of LLVM is to offer as much GCC compatibility as is feasible. In this way migration from GCC is made easier. But just how compatible are they? From the user perspective, this is primarily down to equivalence of the user interface and I’ve carried out some experiments to see how close the two compilers are.

This is more than just curiousity — LLVM is still relatively immature in its documentation. While much of what LLVM provides online is very good, it is certainly far from comprehensive. Many LLVM options are documented publicly only through their commit message in the source repository. A common piece of advice is to look up the option in the GCC user manual, since the two compilers have such a similar interface.

The statistics

I scraped all the generic options listed in the GCC user manual and from clang –help-hidden and applied these to both compilers. You can find the scripts I used on GitHub:

github.com/embecosm/llvm-extended-documentation/tree/master/support-code

Using top of tree versions (as of 2 February 2016) of both compilers, running native on an x86_64 machine that is running Fedora 22 Linux, I found the following:

397 options work in both GCC and LLVM

433 options work only in LLVM

598 options work only in GCC

The story is not quite so bad as it seems. A large amount of the difference can be put down to two causes:

LLVM and GCC have highly disjoint sets of options to control warning messages

LLVM has a deliberate design approach to not expose much internal detail through the main driver command, whereas GCC exposes everything

I have a strong suspicion that a lot of warning messages with different flag names are really the same, and an exercise in rationalizing these, would make a big improvement in the consistency of the two compilers. However, having different names for warning options is not generally going to break things too much, so I can see why it has not been a top priority.

The second issue is a matter of design philosophy; LLVM has chosen not to allow the ordinary user to control the details of compilation (although much more control is possible by bypassing the driver and calling the compiler components directly). However, the ability to control GCC in intricate detail is a key advantage in compiler research — and an area where LLVM’s philosophy, if taken too far, could place it at a disadvantage.

Errors in documentation

A side effect of testing all the options is that I found a reasonable number of errors. GCC in particular has suffered from bit rot, mostly from just having a much larger and older user manual. Nothing serious: options long deprecated that are still in the manual, options missing from the summary and/or index, and in a couple of cases not documented at all. This is a side-benefit of this exercise and I shall submit patches to fix them.

LLVM also had some omissions, although in this case the faults were in the reporting of –help-hidden. Several options are documented, but won’t work from the driver. A small number of options are target specific, but not declared as such. Again I shall submit patches to correct these faults.

What next?

Clearly the idea of using the GCC manual for LLVM has considerable limitations. The driver behind this analysis was the need to produce better documentation for LLVM. In particular, documentation that is suitable for a commercially deployed compiler. This is the one area where some proprietary compilers run rings round both GCC and LLVM. This work is well under way and I’ll be talking about it more in a future post.