Intel has bad news for software developers. It's been hinted at already, but now the company has stated explicitly: it's not enough for software developers to be targeting dual, quad, or eight cores. No, the future holds tens, hundreds, or thousands of cores, and developers are going to have to bite the bullet and write programs that will scale to such systems.

This recommendation came in a recent blog post, aptly titled "Unwelcome Advice." Having to write programs of that scale on systems with hundreds of processors is difficult and totally alien to the vast majority of developers out there. The blog post describes two ways that developers are trying to accommodate increasingly parallel processors. Some are trying to move piecemeal, targeting first two or four cores, then perhaps extending it to eight, maybe even 16. Others are jumping headlong into writing software that can run on innumerable processors.

This second approach, Intel says, is the one to go for. Although it will probably cost more upfront—that kind of design needs to permeate the entire application and be built in from day one—it's going to yield dividends in the long run, because before too long, the processors we buy and use will have cores almost too many to count. Trying to tackle the parallel performance problem one core at a time might work at the moment, but this trend will not continue.

This is indeed unwelcome advice. Writing programs for arbitrarily large numbers of cores is a very different task than writing programs for just a few cores. Targeting just a few cores can often be done without making wholesale changes to a codebase by making incremental changes to existing functions too.

For example, a single-threaded game engine might be modified to create one thread dedicated to sound playback, one thread dedicated to graphics computations, and one thread dedicated to enemy AI. Each of these threads could use almost the same code as was used in the single-threaded game, making the cost of the changes quite low. The performance improvement will also tend to be fairly modest, and most importantly of all, such a scheme will not use any more than three cores. This can be taken a bit further with the technique of "software pipelining"—breaking down a computation into multiple steps that are performed sequentially, one step on each core—but it's still limited to a fixed number of cores. If a function is broken down into four independent steps, it can use up to four cores, but it can never use any more than that.

Intel is saying that instead, the game developer should be splitting up each of those tasks so that they can run on an essentially unlimited number of cores. So instead of taking the graphics engine, the AI, the sound engine, and simply running them as-is, they each need to be rewritten in such a way that they will spread their workloads evenly across as many cores as are available. This will require considerable redevelopment of the engine.

For some applications, this is not such a big deal; the algorithms that split the work up in this way are well-known. Ray tracing, for example, can easily be subdivided across as many cores as a processor has. This is one of the reasons that ray tracing is set to become the Next Big Thing in 3D gaming; as well as offering improved graphical capabilities, it's easy to accelerate simply by adding more cores. But many other tasks are not so fortunate, with problems that do not easily scale up to meet the demands of the many-core era. And when the algorithms do exist to compute in parallel, the tools for actually developing and testing them are much weaker than their single-threaded counterparts; today's programming languages are strongly oriented towards single-threaded development. Even in the HPC world of scientific computation, which has long been developing for clusters with hundreds of thousands of processors, scientists are finding it difficult to get as much performance from their clusters as they should.

Intel and AMD alike have been saying for several years that the days of "free" performance scaling from faster processors are behind us, and that future performance improvements will require software changes to take advantage of the abundance of cores. This clear message that software will need to be able to scale even onto huge numbers of cores isn't unexpected. But it is nonetheless unwelcome. Painful times are undoubtedly ahead.