Choosing the Right Project

DDJ: Let's say I'm a developer with some experience and I'm interested in contributing to one of the Apache projects. How do I get started?

BB: What's your motive?

DDJ: To get more experience, working with bright folks who are doing stuff in an area that interests me.

BB: That's not the most usual path. More often, it's a developer who wants help to solve a specific technical problem. After some Google searching, he's found some packages that claims to do x. And, if there's one that's free, that's the one that's going to get evaluated first. And he starts pulling it down and playing with it.

DDJ: Yes. And then what does he do?

BB: Let me talk about a sequence of events that's more likely to happen. The first step is generally to determine whether the software is any good.

When I look at a piece of open source software before I download code, I'm looking to see whether a lot of people are complaining about broken installations, or if there are questions that suggest poor programming practices. And are people getting answers quickly?

Every good open source project has a public discussion forum, email or forum-based, and has developers who have a stewardship mentality about it and care about happy customers, even if they're not paying customers. So, before even touching the code, I would evaluate the community  because there is an awful lot of code that has no community behind it, such as somebody open sourcing something they worked on at their last job, or an overnight hack, with no intention of making it usable.

Evaluate the community, look for activity, look for a release every couple of months, people who've used it and said good things or even bad things.

DDJ: So, if you determine the project fits your criteria, then what?

BB: You read some docs, you watch some people talking. Then you download the code, compile it, install it, and give it a dry run. If it's running and doing some cool things for you, you might show it to your boss. You then deploy it to your production  and you're running. Then you discover there's a bug.

Now, what do you do with it? You probably dive in to figure out what's really going on in there, and in the course of that, you go rooting through the discussion lists and the developer lists. Most projects have a differentiation between the users list and the developers list, and that's so that the developers can stay focused on building new stuff and those who want to help the community and the community themselves can support each other with a basic Q&A somewhere else.

DDJ: Yes.

Start By Contributing Defect Data

BB: In the course of doing this, you probably looked in the issue tracker to see if anyone else has reported a "foo-bar not-found" sort of thing. In the course of narrowing it down, you've either realized that it was a mistake on your part or that you've found a bug.

The bug [might be] a pre-existing, pre-known defect and maybe you can actually add some data to it so the next person can find it more easily than you did. In which case, you want to get an account on the issue tracker and post a comment on that issue, saying, "Hey, this is happening to me, too," and try to contribute, add to the conversation that's there.

This helps because bugs take a long time to get from first being noticed to being resolved. Developers often ignore data from users, trying to recreate the conditions under which things happen. The vast majority of the work in fixing a bug is something that even users who don't understand the code can actually help with. They can actually try to replicate the bug, write a test case, etc.

And that is all extremely valuable. A programmer who is familiar with the code may be able to dive in and fix it. But that's such tedious work and it's high value because it's tedious and no one really wants to do it. So, to start, try looking for the outstanding defects and see whether they need further triage.

DDJ: OK.

BB: Karl Fogel talks about this in his book, Producing Open Source Software, [a book, which without a doubt is the best guide available for running an OSS project.  Ed.] namely, the benefit of marking certain issues as bite-sized tasks  things that developers could take on to understand the layout of the code, how different systems call each other, etc. Because there are often bugs that aren't big architectural defects but off-by-one errors or edge-case kind of things that benefit from a lot of triage.

DDJ: An excellent idea.

BB: Throughout all this, there is the conversation on the users' or developers' list. These messages are the lifeblood of the community. It's the banter across the dinner table that drives the process. Join either the users' or developers' list. Let that simmer in the background. Don't pretend you have to understand every word, just get an ear for the music of the discussions. Eventually, you'll see comments that map to some of the situations that you see. Some of these lists have100 messages a day. You can't read all of them but you can get a feel for the gestalt of the project.

Contributing to Documentation

DDJ: That's a great sequence to start with. I notice that on many projects that are trying to solicit participation, they recommend working on documentation, which always seems to be in short supply. How does that work?

BB: It's amazing to me how people think of documentation as easy or an afterthought, but there's a huge difference between documentation written by someone coming up the learning curve and documentation written by someone who really knows it. I'd say well designed and engineered documentation is more important than well designed and engineered source code. Because that's the ladder people climb up to go from casual first-time user to core user and core developer. And that has to be a solid ladder. A lot of projects try to encourage the developers, when they commit a source code change, to concurrently commit documentation changes. That's a high bar though, because many developers are not English-as-a-first-language, or are not proficient writers.

I'd say the other caveat is I think having new users come in and contribute to training materials is more appropriate. I think the format of training (especially screen capture and video, because it's a form of performance art), really forces you to learn the material: "Here's why Drupal is a kick-ass CMS, and here's how to build your first site with it." There's a saying: People remember 10 percent of what they're told and 90 percent of what they teach.

Working Your Way Up Through the Meritocracy

DDJ: Developers who are contributing out of ambition rather than because they have a specific problem to be solved may believe that the meritocracy provides a certain type of reward. Being a contributor is a feather in the cap if it's an esteemed project. So what typically moves somebody in the community's eyes from being just an occasional contributor into one of the leads, or a formal position on a project?

BB: Some projects, like Apache, have more formal recognition of a developer as a committer  granted certain privileges on the repository. Even though commits can always be backed out, it is generally considered a mark of honor that other developers trust you enough to give you the keys. Other projects give out commit privileges like candy  apparently Gnome  and the premise behind that is it should be easy for everybody to throw their patches into the pool and we'll filter and sort through them later.

That's partly a tool question; Git is easier [for] managing a lot of users who aren't core committers. But being a committer on Apache is a big deal. The decision is made by one committer proposing to other committers on an individual project, "This person has contributed lots of valuable patches in the past, and has been helpful to new users."

There's always some work on a project that goes beyond self-interest. There's talk about aggregated self-interest, but it's actually enlightened self-interest, in that you've got to write code that can be understood by others, and when somebody has a newbie question, helping them find the answer to that question will pay off tremendously. There's always going to be many more users than developers, so it's incumbent on developers to give a little user support, and help new developers over the hurdles in getting their environment running and understanding the code layout. Someone who shows that level of altruism  it doesn't have to be full time  but there are a lot of people.

At Apache, it's a recognition not just for a few good patches, but for a commitment, a communication style, and an understanding of this thing called the Apache Way, which is not clearly defined but generally is do unto others as you'd have them do unto you: Have high quality code, be clear in your communication, and have a team-oriented spirit. That's the criteria on Apache to be awarded commit privileges. And just be human, be on the mailing list, be helpful, help get the bug queue down. No active open source project has no bugs open. There's always something to do there.

DDJ: Jeff Fredrick, who headed up the CruiseControl project for a long time, told me that one of the things that happens is that the people who should become committers generally stand out by the nature of commitment and contributions. There's not a lot of discussion, it's generally pretty clear. Would you agree?

BB: Hmm, I can think of frequent examples of significant private conversations among committers over whether someone should have commit privileges, although that's less controversial because committer privileges can always be revoked.

DDJ: What about not granting privileges?

BB: I think there are some projects that err too much on the side of not granting commit privileges. It can seed various conspiracy theories as to whether it's justified or not. Sun, for example, with Open Office, really never gave a lot of commit privileges outside the Sun developers, because their working style was focused on a small cluster of developers in one physical location, having worked together for 20 years, and they found it hard to trust other developers. So that's a case where they probably erred too much on the side of holding commit privileges too close to the vest.

DDJ: What about using branches and forks?

BB: I do think both Subversion and Git have made it easier for people to maintain branches and forks of code than it used to be, so there tend to be fewer fights over commit privileges. Instead, what you see is people just working. And they'll say, "You've got a good code base, I've got an extra patch, here's my tree, you can pull that patch from my tree, or someone else can build a derivative from that." In some ways, I think this has actually hurt the ability for communities to gel around a single code base. For instance, the Linux kernels that ship with all the different distros out there, it's pretty much a different kernel per distro. Different combinations of patches and settings. I think the Linux foundation does a good job of driving the Linux standard base, and we have much more conformance than we might otherwise have. It's still tough.

Every Apache project has a single code base. It has development branches and current branches and stable branches and all that, but the pool of developers are still focused on building one thing and building it iteratively.