Building blocks

"Atom" is the brand name for Intel's newly-launched ultramobile processor line, but it could just as well be the name for Intel's next-generation 45nm microarchitecture. This new core microarchitecture, codenamed Nehalem, forms the basic building block from which Intel will assemble the brains for everything from high-end servers to svelte notebooks. Insofar as Nehalem represents a lot more than just a new processor, it's a significant shift for Intel at almost every level.

In this article, I'll give a general overview of Nehalem, focusing on the major changes and big new features that the architecture will eventually bring to Intel's entire x86 processor line. A more in-depth examination of Nehalem from me will show up later in the spring; for now, read on for the highlights. Here's what you need to know about Nehalem.

It's the bandwidth, stupid

Moore's Law has given processor designers an embarrassment of transistor riches, and nowhere is that more apparent than in Intel's 45nm Nehalem processor. Debuting in 4- and 8-core variants later this year, Nehalem packs a ton of hardware into a single processor socket. (Early numbers put the transistor count of a quad-core Nehalem at 781 million; no numbers for the 8-core model have appeared yet.) But trying to feed all of that hardware with the Intel platform's existing frontside bus architecture would be folly. So, just as importantly, Nehalem also sounds the long-overdue death knell for Intel's positively geriatric frontside bus architecture.

The radical change in Intel's system bandwidth situation that Intel's new QuickPath Interconnect (QPI) represents is perhaps the largest single factor that shaped Nehalem's design. Between QuickPath and Nehalem's integrated memory controller, a Nehalem processor will have access to an unprecedented amount of aggregate bandwidth, especially in two- and four-socket implementations.

What this means is that Intel no longer has to equip its processors with freakishly large unified caches designed to mitigate the effects of the bandwidth starvation with which Intel platforms currently struggle. The chipmaker is now free to use all of the transistors that Moore's Law affords more flexibly and intelligently, and this freedom has profound effects on every aspect of Nehalem.

Let's take a look at what this bandwidth improvement means for Nehalem-based products across all segments, from servers to notebooks.

Remixing the microprocessor

In some ways, Nehalem is Intel's most significant processor since the Pentium 4, insofar as it signifies a major shift for the company's x86 strategy. The ill-fated Pentium 4 was a relatively radical design conceived with clockspeed in mind. Nehalem, in contrast, is a more progressive evolution of Intel's existing, mobile-oriented Core 2 products; all of its changes are made with a view to exploiting the large amounts of parallelism that Moore's Law affords at the 45nm process node and to taking advantage of QPI's bandwidth.

Because of this emphasis on parallelism and bandwidth, "Nehalem," broadly conceived, is less of a "processor" in the classical sense than it is a set of building blocks that can be assembled in different configurations for different market segments.



A four-core Nehalem processor, with three DDR3 channels and four QPI links

Nehalem-derived processors—if it's still appropriate to call them "processors" and not "systems-on-a-chip" (SoCs)—will mix the following elements in different proportions, depending on the platform and product:

Number of cores

Number of memory channels on the integrated memory controller

Type of memory supported (registered and unregistered DDR3 or FB-DIMM)

Number of links in the QuickPath interface (for scaling QuickPath bandwidth)

L3 cache size

Power management features

Integrated graphics

That's quite bit of customization available, and this approach is what will let Intel slip Nehalem into all kinds of market segments. Indeed, Nehalem's processor core actually reminds me of the Linux kernel in that it's a small unit that can be augmented in different ways with add-ons so that it fits everything from a set-top box to a supercomputer cluster.

So far, Intel has said that Nehalem will scale from two to eight cores, but the company has talked about only the four-core, server-oriented part. All Nehalem configurations have a number of Nehalem cores—each with a 32KB, four-way set associative instruction cache, a 32KB, eight-way set associative data cache, and a private, low-latency 256KB L2 cache—all attached to an inclusive L3 cache that will be sized to fit the number of cores and target market.

The four-core part that Intel has detailed weighs in at 781 million transistors, much of which is no doubt the very generous 8MB L3 cache. This part also includes an on-die, three-channel DDR3 memory controller and a QuickPath interface that supports four QuickPath links. As I noted above, the number of memory channels and QuickPath links in other Nehalem-based products can be expected to vary with the part and target market.