When I asked Marc Espie (espie@) for a summary of the work he did at p2k7 for my writeup, he said "make -j, but it deserves an article of its own." So, here it is. Marc sent the email below to misc@ which summarizes how he got started working on make(1), how that changed from maintenance to major improvement mode and how he was able to make `make -j' usable.

From the email:



Marc's full email is below.

This was really shortly mentioned on undeadly, because it probably deserves a separate announcement and article.

First, I want to really thank robert@ again for the organization, and putting up with rude OpenBSD french developers as he has... plus the people who donated enough to make these kind of events possible.

Also, my laptop died 3 weeks before the hackathon, and I got a new one, thanks to project money. It's not really the most expensive laptop you've ever seen, but it has a dual-core...

Now for some background. I've been maintaining OpenBSD's make for a long time (over 8 years, I think). It started as simple bug-fixes, then speed-ups, then more radical clean-ups.

About one year ago, I had a kind of epiphany (yeah, I'm a fan of Angel): I realized that this code is really atrocious, and instead of fixing bugs, I started replacing big chunks of it. Not to disparage the guy who wrote pmake in the first place, as he had very different constraints and goals, but it is painfully obvious this is a half-finished research project, and not an industry-standard POSIX make.

So, I started cutting stuff that no-one uses, and options that simply don't work, to try to make sense of the beast. And I changed algorithms. Most specifically, I streamlined the suffix handling, and I killed all the remote job handling.

To make sense of make, you've got to realize there are basically two beasts folded into one: make in `compat' mode uses its own engine to figure out which targets to compute first, and its own job runner. The engine is rather simple, since it doesn't have to queue things up, and can just run commands. The parallel engine is a bit more complex, since it tries to explore more of the tree to start up several jobs at once. It also has an interesting idea: it tries to create shell scripts that agregate commands to minimize the number of processes created. Unfortunately, THIS is a bad idea, in modern times, since POSIX mandates separate commands must be run by separate jobs. As a result, things that work with standard sequential make no longer run with parallel make.

A few months ago, I started designing a way to overcome those issues. Mostly, I wanted to get rid of the shell script creation in the parallel make case. I realized I could have a `tail-call optimization': if I fork() a job to compute a target, and then fork/exec each command separately, I would not need to fork() the last command, thus optimizing the really common case that uses one command per target.

Enter p2k7, with a tall goal: try to make make -j usable. This was an ideal setting: I had a week mostly empty of other contingencies, and a few people motivated to give me feedback. So I started merging the engines, and killing old make code (all the stuff that was building shell scripts). Pretty soon, I ran into debug issues: the output was really mangled, and unusable.

make -j uses pipes to separate the outputs from various jobs, and tries to print stuff line by line. I realized it was keeping a lot of fds open! whereas it should close about half of them (this explained why cnst@ had run into the allocation bug that fast... make was gobbling file descriptors like candy), and I also realized I could make things better by using non-blocking fds and applying a greedy approach: try to get as many complete lines/buffers from one fd before getting to the rest.

The result was immensely satisfying: instead of having chunks of intermixed outputs, suddenly, very long linking lines were appearing as one single line (since make's internal buffers are much smaller than a full pipe kernel buffer, this means that, in most case, all the job output was already there).

The devil lies in the details, as usual. On tuesday, my src/ build was stopping in make clean, in the middle of gnu/binutils. And my error messages were worth shit: make was basically telling: `oh, btw, there was an error in one job. Here, figure out what's going on'.

So I revamped the error messages on wednesday morning, and started polishing other stuff, like duplicating pipes so that stdout and stderr do not get mixed up.

I finally figured out what was going on: turns out make clean was running stuff like:



distclean: -rm somefile

After that, I ran into quite a few more issues in src. Let's say that it's not yet ready for parallel build. Together with miod@ (who was playing with make remotely), we fixed a few of them... let's just say that what remains is the hard ones... it will probably take a while to fix them. robert@ has also been fairly helpful. It's stunning to see a quad-amd64 compile its whole kernel in 10 seconds...

(I also think I gave miod a lot of things to pull his hairs... make -j is quite a nice stress-tester for the SMP systems he's playing with, and he's seen his share of panics in pipes with the new code...)

I had some pleasant surprises in xenocara and ports. xenocara is actually mostly ready for make -j. The fixes should happen soon. And more ports than I expected are actually happy with make -j. In some cases, this is quite a speed improvement, and I can truely run at twice the speed. In other cases, this is not worth anything. For instance, KDE3 won't compile faster. And yes, that's one of the reasons why they're abandoning automake for KDE4, recursive makes + enable-final don't benefit from parallel-core compiles.

So now, I'm back in Paris (which was quite an adventure, thanks to the Air France airline strike), and a week later, and a big chunk of make -j work has been committed.

There are still minor fixes floating around, there's still at least one biggie to fix, so no, all your makefiles are not safe yet (and art@ made me realize I also need a way to control the global number of jobs in parallel make if I don't want to turn make into a fork() bomb).

[okay, I know you will want to play with make -j, if you have a dual-core or quad-proc. It's such a fun toy... don't be too disappointed when things still don't build, and be very, very careful with things that *appear* to build correctly, but still crash.]

But there's hope. It is quite likely *a lot* of stuff will compile with make -j in OpenBSD 4.3.

Again, thank you for your donations that made my new laptop and p2k7 possible. It really *does* make a difference (like, 3 or 4 months of extra delay if I didn't have all the extra incentive to make this work).