A large legacy code base is a challenge for any team to embrace and improve. So how well does a distributed team of volunteers address the problem?

A talk at FOSDEM shed light on how the large and diverse team assembled by The Document Foundation (TDF) is approaching the huge LibreOffice code base left in the wake of Oracle's withdrawal from OpenOffice.org. The result is not only an impressive sequence of on-time releases, but also a range of development innovation. In particular, the "bi-bisect" technique they've developed could be a great approach for others faced with large, complex code bases.

[ Also on InfoWorld: New LibreOffice turns up the heat on Microsoft | Review: LibreOffice 4 leaves you wanting more | Track the latest trends in open source with InfoWorld's Technology: Open Source newsletter. ]

The talk, "LibreOffice: Cleaning and Refactoring a Giant Code Base," was delivered by Michael Meeks, a developer employed by Suse who has been working on LibreOffice (and OpenOffice.org before it) since 2000. Meeks covered both the development challenges of LibreOffice and the new features of the 4.0 release, which InfoWorld covered on release day. But the narrative chronicling the development challenges was instructive, inspiring, and worth digging into.

Lowering barriers

How did the community form in the first place? A core of developers carried over their work from the former OpenOffice.org community, but the key move was to make it easy and fun to join in with development, so the project avoided barriers to participation. The mailing lists were tuned to welcome newcomers rather than to favor existing developers; a page of "easy hacks" created for newcomers had small, tasty morsels to chew over, and extensive README files were made easily findable.

The project adopted Git as its version control system, allowing easy contributions as the best-known and most widely used open source tool for the task. The process has been simplified further by adding Gerrit, which enables what Meeks described as "permission-free commits."

The code base was hard to build, so the project set up automated Tinderbox continuous-integration build servers, allowing any developer to work on the code without needing to create their own complex build environment in multiple operating system environments. The code has been substantially cleaned up with translation of comments from German to English for more accessibility around the world (most developers have English as at least a second language). The clean-up also involved a great deal of refactoring of old approaches into more modern ones and the elimination of unused code left over from defunct platforms -- this is 20-year-old code, after all.

Most recently, the project has dealt with larger, more ambitious refactoring, such as reimplementing the Microsoft document filters and introducing layout-based dialogs in place of hard-coded options. Meeks covered a number of significant tasks that are in progress; his slides (PDF) offer full details.

Upholding quality

All this change to a complex, fragile, legacy code base could well lead to breakages. Indeed, regressions were a constant issue for LibreOffice. To deal with them, the project has taken several approaches to ramp up code quality without killing progress in the name of stability. They've greatly increased the number of unit tests available, which has allowed more changes to be made more quickly, adapting to the ever increasing flow of contributions from new contributors.