Asking questions about performance online universally invites scorn and accusation. A large number of programmers apparently feel that the efficiency of code is nowadays insignificant. So long as the functional requirements have been met the approach is golden. Any attempt to discuss improvements is met with stiff resistance. The most common of the mantras are “premature optimization” and “have you measured it”.

While at times such notions may not be misguided, this general attitude towards ignoring performance considerations is quite dangerous. Let’s look at a few reasons why performance is still quite relevant, especially early in the development process.

Performance Issues

Some things don’t parallelize

There are a great number of functional behaviours which simply can’t be done in parallel, or rather, gain nothing when done in parallel. At the high-level, most client-server requests tend to have one outermost request which works in serial to assemble the response for the client. Perhaps many of the sub-requests can be done in parallel, but this serial code still has to execute and can often become the bottleneck in the total response time.

At lower levels there are algorithms which can’t be efficiently handled on multiple cores. Often the overhead of splitting up the work is more than the cost of the algorithm itself. Or sometimes, just like the client requests, the algorithm has a serial nature that just can’t be avoided.

Chips aren’t getting any faster

In recent years the speed of individual cores has not really been increasing. While we are certainly not at the theoretical limits yet, the physical obstacles to increasing speed are significant. Essentially we’ve hit a speed limit in the commodity market and chips simply range from 2 to 4 Ghz. Industry has decided providing more cores is better than providing faster cores.

Combine this with the inability to parallelize certain behaviours and you can see a problem.

Can’t fix it later

Designing to be scalable, either to multiple cores, or to multiple computers, is something that has to be planned for fairly early in the process. While you don’t need to scale immediately, you at least need to chose an architecture and algorithms that can be easily adjusted later. Most people, including programmers, tend to think in a serial fashion, thus most code tends not to lend itself to concurrent processing. If you have failed to consider concurrency, and scaling, early in the process, you may find that path fairly difficult.

Even small decisions made early on can lead to significant performance loss when used systemically through the code. Perhaps a band internal flow, or poor use of global memory. Once a bad behaviour is ingrained at all points in the code it becomes extremely time consuming to change. Naturally programmers just follow the existing code; whatever is there at the start will be magnified throughout.

Solutions are non-trivial

This is a good counter to those who blindly argue you should profile your code to see what is more efficient. If any algorithm could be coded in multiple forms within a few hours then perhaps simply trying them out is a good idea. Most algorithms are however part of a larger system and trying to segregate and replace that component can often be difficult. Often the item to be improved can simply not be isolated and is more of systemic feature in the code. Given the amount of time that will be required to make a change, or code the first version, it seems entirely reasonable to try and think about the efficiency ahead of time.

This also entirely neglects that doing proper performance measurements is very hard. But that is a topic all on its own.

Broken theory of scaling

A system which can scale in theory is vastly different than one that can actually scale. A common failure is related to networking. A cluster of computers has to be connected by real switches and wires which have a fixed limit on the traffic they can effectively handle. Beyond that you need to start grouping and segregating. Designs often fail to account for this, requiring direct connectivity between all computers and/or forcing all traffic through a single machine.

Inefficiency costs money

Even if your system can scale simply by adding more computers, this isn’t necessary the best solution. Every additional machine has a real cost associated with it. Beyond an initial purchase and installation cost, there is a continual ongoing cost in maintenance and electricity, as well as the final disposal cost. These expenses are significant, and in a highly competitive service market, reductions of even 5 or 10% can make huge difference in the ability of the company to attract clients.

Conclusion

While spending too much time fine-tuning the details is often fruitless, backing too far away from performance concerns can be downright dangerous. A lot of major performance issues can be handled up-front with just a tiny bit of planning and forethought. Even if your non-functional requirements are light at the start, don’t be surprised by changes in demand, and in particular peak load problems. Never forget that hardware has physical limitations to scaling, and that every additional machine is an added cost.

This is not attempting to be an argument in favour of excessive optimization. It is more of a counter to the alarming trend I see to completely ignore performance issues. Don’t ignore that 5% loss of speed in your module, it’s going to magnify with the 5% loss in all the other code!