Performance optimization: The first thing to do

In the last couple of years, I’ve done a lot of performance optimization. I’ve optimized raw C, Python and PHP code. I’ve optimized databases: tweaked settings, memory usage, caches, SQL code, the query analyzer, hardware and indexes. I’ve optimized templates for disk I/O, compilation and rendering. I’ve optimize various caches and all kinds of other stuff like VMWare configurations.

As I’ve done a lot of optimization, I’ve noticed a couple of things. One of these things is how persons in different roles look at optimization:

Programmers always want to optimize the code. The code isn’t optimal, and a lot of speed can be gained from optimizing the code. This usually boils down to optimizing algorithms to be more efficient with either CPU cycles or memory. In contrast to the programmer, the system administrator always wants to tweak the configuration of either the Operating System, the application itself, or some piece of middle-ware in between. Meanwhile, managers always want to optimize the hardware. “Developer time is more expensive than hardware”, they quip, so they decide to throw money at faster and more hardware, instead of letting the developer optimize the code.

What none of them realise is that they’re all wrong. None of these approaches are any good. When it comes to optimizations, all of the people above are stuck in their own little world. Of course managers just love the fact that programmers “don’t understand cost versus benefit”, and a nice saying such as “Developer time is more expensive than hardware” has a really nice ring to it. Managers have a high-level view of the application’s eco-system, and so they search for the solution in the cheapest component: hardware. Programmers, on the other hand, know the system from a very low-level point of view. And they naturally love the fact that managers don’t understand technology. They are intimately familiar with the code of their application, or the database running behind it, and so they know a lot of its weak spots. Of course, they’ll assume the optimization is best performed there. The system Systems administrators have limited options either way, so they stick to what they can influence: configuration.

An excellent example of this is a recent post on the JoelOnSoftware blog. I’ll recap the main points I’d like to illustrate here:

One of the FogBugz developers complained that compiling was pretty slow (about 30 seconds). […] He asked if it would be OK if someone spent a few weeks looking for ways to parallelize and speed it up, since we all have multiple CPU cores and plenty of memory. […] I thought it might be a good idea to just try throwing money at the problem first, before we spent a lot of (expensive and scarce) developer time. […] so I thought I’d experiment with replacing some of the hard drives around here with solid state, flash hard drives to see if that helped. Suddenly everything was faster. Booting, launching apps… even Outlook is ready to use in about 1 second. This was a really great upgrade. But… compile time. Hmm. That wasn’t much better. I got it down from 30 seconds to … 30 seconds. Our compiler is single threaded, and, I guess, a lot more CPU-bound than IO bound.

This is an excellent example of how a manager would try to solve optimization problems. At the start of the quote we see the typical way a developer would tackle the problem: parallelize and speed up. In other words: low-level optimizations.

Now it turns out Joel was wrong. Solid State disks didn’t help at all, since their problem wasn’t with disk I/O at all. But that doesn’t mean the developer was right either! I like to see it as a kind of Schrödinger’s Cat situation: both are wrong, until one is proven right. Why is that? Because they have no idea what the problem is!. All they’re doing is guessing away at the problem in the hopes of finding out what exactly will solve it, without having any clue about the actual problem! We can see this quite clearly: after having dismissed disk I/O as the problem, they assume it must be because “our compiler is single threaded, and, I guess, a lot more CPU-bound than IO bound.”. Again, they jump to conclusions without knowing what the problem is. So now they might not only waste a lot of time on solid state disks without fixing the problem, but they’re about to spend weeks of developer time without knowing if that will fix the problem.

So, here is my point:

The most important thing about optimization is analysis.

You can’t fix a problem by simply trying different solutions to see if they work. In order to fix a problem, you have to understand the problem first.

So, please, if you’re a developer, don’t assume saving a couple of CPU cycles here or there will solve the problem. And if you’re a manager, don’t assume some new hardware will solve the problem. Do some analysis first. Finding out if disk I/O, memory, CPU cycles or single threading is the problem is really not that hard if you spend a little time thinking about it and benchmarking various things. And in the end, you’ll have a much better overview of the situation and the problem, and you’ll be able to come up with specific solutions which will actually work.

And that’s how you save money.