Late last week, Jen-Hsun Huang sent a letter to Nvidia employees, congratulating them on successfully launching the highly acclaimed GeForce GTX 680. After discussing how Nvidia changed its entire approach to GPU design to create the new GK104, Jen-Hsun writes: “Today is just the beginning of Kepler. Because of its super energy-efficient architecture, we will extend GPUs into datacenters, to super thin notebooks, to superphones.” (Emphasis added — Nvidia calls Tegra-powered products “super”, as in super phones, super tablets, etc, presumably because it believes you’ll be more inclined to buy one if you associate it with a red-booted man in blue spandex.)

This has touched off quite a bit of speculation concerning Nvidia’s Tegra 4, codenamed Wayne, including assertions that Nvidia’s next-gen SoC will use a Kepler-derived graphics core. That’s probably true, but the implications are considerably wider than a simple boost to the chip’s graphics performance. Tegra 4, also known as T40, could very well be a fundamental game-changer for Nvidia and the most important Tegra product to date.

Improved game performance

The GPU that powers Tegra 2 and Tegra 3 has a fixed number of pixel and vertex shaders and is much more closely related to GeForce 7-era products than the Unified Shader Architecture Nvidia debuted with the G80 (GeForce 8). When Nvidia describes T2 & 3 as “fully programmable,” it’s true — but it’s not at all the same as being DirectCompute/CUDA/OpenCL-compatible. Current Tegra products are capable of running complex shader programs, but not the general-purpose code that makes things like PhysX or GPGPU calculations possible.

GPUs with a Unified Shader Architecture (all Nvidia products from G80 onwards) have two advantages over their fixed-function cousins. First, they’re more efficient. A fixed-function GPU’s performance can vary considerably from game to game depending on whether a title emphasizes pixel shading or model geometry; this is quite visible when comparing performance between Tegra 2/3 and the SGX 544. A Kepler-based GPU would be much more flexible, able to allocate its execution resources to process either workload. This can indirectly lead to decreased power usage — a wide array of more efficient stream processors doesn’t necessarily need to run at nearly as high a clockspeed as a fixed-function chip.

Second, and arguably more important, is their ability to handle functions that would normally be processed on the CPU. This is where we expect T40 to come into its own.

A second chance for Hardware PhysX

As a software SDK for physics calculation, Nvidia’s PhysX solution has been quite successful; it’s used in nearly 400 games across consoles and PCs. Nvidia’s attempts to encourage game developers to include support for so-called hardware PhysX — the term refers to using Nvidia GPUs for significantly enhanced physics effects, cloth simulation, and particle interactions — have largely come to naught. Out of the 374 games listed as shipped or in development that are confirmed to use PhysX at PhysXinfo.com, just 19 of them use hardware PhysX. (PhysXinfo’s list of hardware PhysX games shows several cancelled games as still being “in development.”)

Hardware PhysX could fare much better on mobile platforms if Nvidia can show that using the GPU to offload physics calculations leads to better performance, improved power efficiency, and allows for more advanced physics modeling. Many of the most popular mobile games, from Angry Birds to Cut the Rope are fundamentally physics games, but they often rely on relatively crude models.

The challenges here will be on the development side. The best thing Nvidia could do to spur hardware PhysX adoption would be to pay for Tegra-specific adaptations of the most popular physics-based games of today, as well as investing in their own specific titles or in upcoming games. There will always be developers who eschew hardware PhysX in favor of a simplified software-based solution that can run on every mobile device, but the cost and complexity of integrating hardware PhysX into a mobile game is a fraction of applying the same technology to a PC title.

Next page: Beyond gaming, and finally an opportunity for CUDA