CentOS grapples with its development process

LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing

CentOS, the "community enterprise" operating system rebuilt from the source packages of Red Hat Enterprise Linux (RHEL), recently started development on its next release, CentOS 6. The beginning of the process has not been smooth, however, with the development team talking about restructuring its package repository layout and installation offerings, combined with a heated discussion on the difficulty of recruiting new users into the all-important package review and testing phase.

Although CentOS is a community-developed distribution, its status as an explicitly source-compatible derivative of RHEL means that it has a substantially different QA and release process than, say, Fedora or Debian. Red Hat provides source RPMs for each RHEL release as part of its GPL compliance process. When a new release drops, CentOS volunteers begin systematically poring through the packages, looking for everything with a trademark that must be altered or removed before the package can be distributed by CentOS.

This includes graphical logos and written branding, plus anything else that might lead a user to think that the software comes from or is associated with Red Hat. Although there are some obvious places to begin, such as artwork packages, everything from menus to %description lines in RPM spec files must be sanitized. The fixes are not always simple search-and-replace, either — utilities like the Automatic Bug Reporting Tool (ABRT) that is functionally linked to Red Hat's Bugzilla must also be patched. Only after this process is complete does the development team proceed to the build and packaging QA that eventually leads to a downloadable ISO installation image.

The new upstream source

The RHEL 6 sources were released on November 10th, more than three years after the last major revision, RHEL 5. The size of the base distribution has grown considerably, and now spans two DVD-sized ISO images. In addition, the company is now offering four versions of the release: server, workstation, high-performance computing (HPC) "compute node," and desktop client — up from the two (desktop and server) for RHEL 5.

These changes forced the CentOS developers to re-examine their own offerings, including the possibility of separate server and workstation editions (a change for CentOS) as well as a "light" installation ISO image that would allow administrators to set up a functional minimal server without installing the full, multi-gigabyte image. In the past, CentOS has striven for a single ISO image approach, but some developers expressed an interest on the mailing list of maintaining better compatibility with RHEL's offerings by splitting the different installation profiles into separate images.

At the present, the consensus seems to be retaining the single-profile workstation-and-server approach, although the sheer size of RHEL may force splitting the packages into "core" and "extra" DVDs. The minimal-install option is still under discussion, but seems likely to happen. The goal is to provide a smaller image suitable for use on USB media, virtual machines, and for deployment in an un-networked environment, with the target size being small enough to fit onto a CD.

Still unresolved is the question of restructuring the package repositories. CentOS replaces Red Hat's subscription-based Red Hat Network update service with yum repositories used to push updated packages out to users. Because CentOS supports each release with updates for seven years, properly re-organizing the repositories is a critical decision. The main point of discussion is whether or not to split packages into "os" and "updates" repositories alone, or to move some nonessential packages into "optional" and "optional-updates" repositories, which would mirror the approach that split the installation media into two DVD images.

Round 1 and new contributors

More contentious was a high-spirited debate over the "round 1" package-auditing process, the secondary QA process, and what many perceive as the high barrier-to-entry to new users wishing to get involved in development. Lead developer Karanbir Singh posted a "step one" call-for-help message to the CentOS development list on November 11, just after the RHEL 6 sources were released, in which he appealed for interested parties to get involved with the trademark-auditing process.

More than two weeks later, evidently very little progress had been made on that front. This is especially problematic for CentOS, which in the past had made its releases within a 6-to-8-week window of the RHEL source code drop. Many CentOS users begin by testing Red Hat's 30-day free trial of RHEL, so delaying the release of the source-compatible CentOS update by several weeks can make those users quite anxious.

A flame-war between Singh and another packager — that was at best tangential — frustrated developer Florian La Roche enough that he accused Singh of not making the process open enough. Singh then wrote to the list again, this time complaining that there was a "level of fantasy that some of you guys seem to live with" — namely, that he had publicly asked for volunteers, virtually none had stepped up, and that as a result the "usual suspects" end up doing the work, and at a slower-than-ideal pace.

Lots of people will argue that open source works in a way where people do what they want to do, so you cant tell them what needs doing - and they will do what they want, when they want. Its what many imagine is the 'fun' in the open source way. Fortunately, or unfortunately we [don't] have that luxury. What comes down the pipe needs to be addressed, sometimes its what we want to do - and sometimes its what needs doing because that's the issue on hand.

Untangling a mailing list argument is always tricky, but in this case several factors seemed to converge to frustrate all involved. The first was Singh's perception that he had asked for volunteers, and none had stepped up to the plate. The second was the new users' perception that Singh had not provided meaningful instructions on how to participate — in particular, he pointed the list to an un-finished wiki page, and left several key steps of the audit process undocumented. For example, the wiki page left several resource URLs empty, but marked "coming soon", and there was no indication in the wiki or mailing list post where to find a list of audited-vs-unaudited packages, or how to submit a properly formatted issue to the bug tracker.

Third, several readers accused Singh of being "pedantic," picking apart the grammar and wording choices of other posters in the discussion, rather than responding to the meat of their questions. Fourth, some new users seemed to feel that however the audit and QA processes work technically, they suffered from being too opaque to outsiders. Douglas McClendon noted the lack of documented examples of properly-formatted bug reports and the absence of an overall "progress bar" that would track the current state of the work.

Fortunately, all heads eventually cooled, and the new would-be-contributors posted more in-depth descriptions of the type of documentation that they needed. Singh, likewise, responded to the requests, clarified both instructions and formatting requirements, even observing "This is constructive.. we should have had these conversations about 2 weeks back :/" There is also an AuditStatus page tracking which packages have been checked for trademark issues in Round 1.

Scarcity of volunteers is not unique to CentOS, of course, but the nature of the distribution does make it harder to raise manpower from among its end users. Unlike a desktop-centric distribution, a high percentage of CentOS users are independent contractors who deploy the system for clients. They are certainly a knowledgeable bunch, and tend to be active on the lists. But another major slice of CentOS's install base is commercial web-hosting services, who offer it as a rock-solid alternative to RHEL and other enterprise server distributions. Regrettably, however, many of those hosting providers don't seem to participate in the development process — if they did, even just in package auditing and testing, it would make for a sizable contribution.

Community

All open source projects struggle with some of these "preferred way of doing things" issues. To Singh, it seems, some of the new developers just had not done their homework, such as looking at the bug reports for previous CentOS releases for clues as to the preferred formatting. On the other hand, if there is a preferred format, it clearly should have been documented as such.

Similarly, the divide between the documentation on the wiki and the discussion on the list did its part to further the confusion. In Singh's initial email asking for volunteers, he included a link to http://bugs.centos.org, which was evidently one of the URLs missing from the wiki page. To him, he was providing the updated information there, but to the new developers it was not at all clear that his message was linked to the (still un-updated) wiki page.

Writing good documentation is never easy, but too much of the time open source projects think about "documentation" solely in terms of end-user manuals and tutorials. As CentOS is finding out, documenting the development process itself is vital. Considering that just over a year ago the CentOS project faced a leadership crisis, getting the "preferred way of doing things" out of the heads of the long-time contributors an into a publicly-available resource is something the project needs to address — not just in case the existing contributors disappear, but simply to draw in the constant stream of interested users who want to answer the call to step up to the next level.