GCC unplugged

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

Many programs - free and proprietary - offer a plug-in interface to make it easy to add new functionality. In many situations, the existence of a well-defined plugin interface has been a key driver for the success of the system as a whole; imagine Firefox, for example, without its extension mechanism. The GNU compiler collection (GCC) is an example of a complex system which could benefit from such an interface, but which currently lacks one. GCC developers have been talking about adding a plugin API, but it is far from clear that this will be done; how this decision goes may have major consequences for how GCC works with its wider development community and the free software community as a whole.

GCC is designed as an extended pipeline of cooperating modules. Language-specific front-end code parses code in a specific source language and turns it into a generic, high-level, internal representation. Various optimization passes then operate on that representation at various levels. At the back end, an architecture-specific module turns the optimized internal code into something which will run on the target processor. It's a long chain of modules; at each point in the chain, there is an opportunity to see the code in a different stage of analysis and processing.

There can be a lot of value in hooking into an arbitrary point in that chain. Static analysis tools need to look at a program at different levels to get a sense for what is going on and look for problems or opportunities for improvement. New types of optimization passes could be added at specific points, making the compiler perform better. Project-specific modules could look for problems (violations of locking rules, perhaps) tied to a given code base. Language-specific modules can provide tighter checking for certain constructs. And so on.

Currently, adding this sort of extension to GCC is not a task for the faint of heart. The GCC build system is known to be challenging, and GCC's internal documentation is, one might say, not quite as complete as one might like. Researcher Alexander Lamaison described it this way:

Out of the 6 months, 4 were spent learning the GCC internals and fighting the GCC build process, 1 was spent writing up leaving 1 month of actual productive research... I fully understand that this can seems strange to people who know GCC like the back of their hand, but to a newcomer it is a huge task just to write a single useful line of code. I'm sure many give up before ever reaching that point.

Once they have overcome these problems, developers adding extensions to GCC run into another problem: if they want to distribute their work, they end up in the business of shipping a whole new compiler. Brendon Costa, who works on the EDoc++ GCC extension, noted:

I approached the debian maintainers list with a debian package for this project to see if they would include it in the official repositories. It was not accepted and the reason for that is because it includes another patched version of GCC which takes up too much disk space. They don't want to accept these sorts of projects because they all effectively require duplicates of the same code(GCC)

Both of these problems could be addressed by adding a plugin mechanism to GCC. A well-defined API would make it relatively easy for developers to hook a new tool into the compiler without having to understand its internals or fight with the build process. If an off-the-shelf GCC could accept plugins, distributors could ship those plugins without having to include multiple copies of the compiler. Given that we would all benefit from a more capable GCC, and given the many examples of how other systems have benefited from a plugin architecture, one would think that the addition of plugins to GCC would not be a controversial thing.

It seems that one would be wrong, however. In a recent discussion on plugins, two concerns were raised:

Adding plugins to GCC would make it easy for people to create and distribute proprietary enhancements.

A plugin API would have to be maintained in a stable manner, possibly impeding further GCC development.

There were also some suggestions that, if the effort put into a plugin API were, instead, put into documentation of GCC internals, the overall benefit would be much higher.

The proprietary extensions concern is clearly the big stumbling block, though. Some participants stated that Richard Stallman has blocked any sort of GCC plugin mechanism for just this reason - though it should be noted that Mr. Stallman has not contributed directly to this discussion. But, given that GCC remains a GNU project, it is not hard to imagine anything which could lead to proprietary versions of GCC would encounter a high level of opposition.

The attentive reader may have spied some similarities between this discussion and the interminable debate over kernel modules. The kernel's plugin mechanism has certainly enabled the creation of proprietary extensions. In the GCC case, it has been suggested that any plugins would have to be derived products and, thus, covered by the GPL. This, too, is an argument which has been heard in the kernel context. In that case, concerns over the copyright status of proprietary modules have kept them out of most distributions and, in general, cast a cloud over those modules. Something similar would probably happen to proprietary GCC modules: they would not be widely distributed, would be the subject of constant criticism, and would be an impetus for others to replace them with free versions. It is hard to imagine that there would be a thriving market for proprietary GCC extensions, just like there is no real market for proprietary GIMP extensions - even though Photoshop has created just that kind of market.

It has also been pointed out that the status quo has not prevented the creation of proprietary GCC variants. As an example, consider GCCfss - GCC for Solaris systems. This compiler is a sort of Frankenstein-like grafting of the GCC front end onto Sun's proprietary SPARC code generator. Back when Coverity's static analysis tools were known as the "Stanford checker," they, too, were a proprietary tool built on top of GCC (the current version does not use GCC, though). People wanting to do proprietary work with GCC have been finding ways to do so even without a plugin mechanism.

The GCC developers could also look to the kernel for an approach to the API stability issue and simply declare that the plugin API can change. That would make life harder for plugin developers and distributors, but it would make it even harder for any proprietary plugin vendors. An unstable API would not take away the value of the plugin architecture in general, but it would avoid putting extra demands onto the core GCC developers.

In general, GCC is at a sort of crossroads. There are a number of competing compiler projects which are beginning to make some progress; they are a long way from rivaling GCC, but betting against the ability of a free software project to make rapid progress is almost never a good idea. There is a pressing need for better analysis tools - it is hard to see how we will make the next jump in code quality without them. Developers would like to work on other enhancements, such as advanced optimization techniques, but are finding that work hard to do. If GCC is unable to respond to these pressures, things could go badly for the project as a whole; GCC developer Ian Lance Taylor fears the worst in this regard:

I have a different fear: that gcc will become increasing irrelevant, as more and more new programmers learn to work on alternative free compilers instead. That is neutral with regard to freedom, but it will tend to lose the many years of experience which have been put into gcc. In my view, if we can't even get ourselves together to permit something as simple as plugins with an unstable API, then we deserve to lose.

Back at the beginning of the GNU project, Richard Stallman understood that a solid compiler would be an important building block for his free system. In those days, even the creation of a C compiler looked like an overly ambitious project for volunteer developers, but he made GCC one of his first projects anyway (once the all-important extensible editor had been released). His vision and determination, combined with a large (for the times) testing community with a high tolerance for pain, got the job done. When Sun decided that a C compiler was no longer something which would be bundled with a SunOS system, GCC was there to fill in the gap. When Linus created his new kernel, GCC was there to compile it. It is hard to imagine how the free software explosion in the early 1990's could have happened without the GCC platform (and associated tool chain) to build our code with.

The vision and determination that brought us GCC has always been associated with a certain conservatism which has held that project back, though. In the late 1990's, frustration with the management of GCC led to the creation of the egcs compiler; that fork proved to be so successful that it eventually replaced the "official" version of GCC. If enough developers once again reach a critical level of frustration, they may decide to fork the project anew, but, this time, there are other free compiler projects around as well. Perhaps, as some have suggested, better documentation is all that is really required. But, somehow, the GCC developers will want to ensure that all the energy which is going into improving GCC doesn't wander elsewhere. GCC needs that energy if it is to remain one of the cornerstones of our free system.

