At a dinner this week with members of the press, NVIDIA CEO Jen-Hsun Huang laid out his view of NVIDIA's past, present, and future in light of recent developments in the processor market. Jen-Hsun's remarks are worth looking at in some detail, as much for what they say about Intel as what they say about NVIDIA. We'll recap Jen-Hsun's take on the processor and GPU markets, followed by a look at the implications of the trends he references for the future of Intel, the x86 instruction set architecture (ISA), ARM, and the CPU market as a whole. Ultimately, we could even see Intel get back into the ARM market, a market where it had considerable success with its XScale line before betting the farm on x86.

NVIDIA's evolution

Huang began by dividing NVIDIA's history into three stages, the first of which—NVIDIA 1.0—was a period in which the company was a maker of fixed-function graphics products. NVIDIA 2.0 arrived with the advent of more fully programmable GPUs, and NVIDIA 3.0 will see the chipmaker transition into a "parallel computing company" that sells parallel number crunching hardware in every market from mobiles to servers.

Mapping the first two stages of NVIDIA's evolution to GPU's transition from a fixed-function ASIC to a full-blown math coprocessor makes sense, but the attempt to fit both Tegra and Tesla into the same "parallel computing" bucket felt a bit premature absent some news about Tegra finding its way into a cloud server product.

Huang also heavily emphasized ARM as the future of not just NVIDIA, but the entire CPU business. In the course of his discussion of ARM vs. x86, he elaborated a bit on the rationale behind Project Denver. Huang was quite clear that NVIDIA could have chosen to produce an x86 processor—he described the licensing and technical problems associated with making an x86 CPU as "solvable" for NVIDIA. But he gave two reasons why the company opted not to go down that road.

First was the fact that there was already another company attempting to compete with Intel in the x86 market (AMD), mostly without success. Why, he asked, would NVIDIA want to jump in and make a type of product that AMD, whose processor designs he praised, had been struggling to turn a profit with for years? So in Huang's telling, there was simply no good business case for taking on Intel in x86, with AMD's troubles serving as concrete proof of the folly of trying to compete with Intel in making yet another x86 processor.

The second reason that Huang gave for going with ARM was that he wanted the company to do something that had never been done before—to make a product that didn't yet exist. In his talk of breaking new ground, he seemed to be referring to both Kal-El (the recently unveiled quad-core Tegra 3 chip) and Project Denver as examples of NVIDIA making something brand new. Both projects see NVIDIA pushing the ARM architecture to new heights of performance, and will let the chipmaker claim a set of "firsts" (i.e., first quad-core A9 part on the market, and first desktop-caliber ARM/GPU combo part).

The optimistic case

In the course of the evening, I tried to press Huang a bit on his second point. Specifically, I pointed out that attempts to do high-performance, power-efficient, out-of-order processors are not, in fact, even remotely new if we ignore the specific instruction set architecture (ISA). Intel has been working on exactly this problem for over a decade. Put differently, I suggested that if we move to a world where ISA really doesn't matter to the user-facing parts of the software stack (i.e., Windows and Flash both run on ARM, and Android and WP7 use just-in-time compilation for application binaries), then ARM products like NVIDIA's get to compete with Intel's chips on old-fashioned performance and power savings.

Intel's track record in both of these areas is very good, in part because of the chipmaker's process leadership and in part because performance and power are increasingly sensitive to SoC- and system-level integration issues. As an example of the latter, I brought up AMD's problems with L2 cache latency, which crippled the performance of their server parts. My point was that these kinds of finicky issues like cache hierarchy design, choice of on-chip interconnect technology, floorplan layout, memory bus bandwidth, choice of multiprocessor interconnect topology, etc. all have a huge impact on real-world application performance, far and above that of the peak theoretical single-threaded performance of an individual core.

Huang's response was to point out that NVIDIA's PC chipsets were routinely top performers, so the company has plenty of experience with the kinds of system integration issues that I raised. He was therefore quite bullish on NVIDIA's ability to use its design prowess to compete with Intel on both performance and power efficiency, using standard ARM cores and a foundry process. He also suggested, in response to another question, that the foundries would close the process gap with Intel eventually.

Though I didn't voice it, my instant reaction to Huang's bluster about outdoing Intel on performance and power was that he was talking complete nonsense. After thinking about it a bit more, I'm even more certain that it's nonsense. I also think it probably doesn't matter, and that Intel could be about to find itself in the same position it placed the now-defunct RISC chip vendors in back in the '90s. Let me explain both points.

ARM whips Intel at Intel's game? Not gonna happen.

First, there's simply no way that any ARM CPU vendor, NVIDIA included, will even approach Intel's desktop and server x86 parts in terms of raw performance any time in the next five years, and probably not in this decade. Intel will retain its process leadership, and Xeon will retain the CPU performance crown. Per-thread performance is a very, very hard problem to solve, and Intel is the hands-down leader here. The ARM enthusiasm on this front among pundits and analysts is way overblown—you don't just sprinkle magic out-of-order pixie dust on a mobile phone CPU core and turn it into a Core i3, i5, or Xeon competitor. People who expect to see a classic processor performance shoot-out in which some multicore ARM chip spanks a Xeon are going to be disappointed for the foreseeable future.

It's also the case that as ARM moves up the performance ladder, it will necessarily start to drop in terms of power efficiency. Again, there is no magic pixie dust here, and the impact of the ISA alone on power consumption in processors that draw many tens of watts is negligible. A multicore ARM chip and a multicore Xeon chip that give similar performance on compute-intensive workloads will have similar power profiles; to believe otherwise is to believe in magical little ARM performance elves.