We know all about Nvidia's GK104 chip, which has most recently been flying through our labs in a dual configuration in the sexy GeForce GTX 690. While that card is the king of gaming (for now), the big daddy of Nvidia Kepler-based GPUs isn't even here yet.

This week at the Nvidia GPU Technology Conference in San Jose, the graphics company took the wraps off of the Kepler-based GK110 GPU that will power the Tesla K20 – a professional-level graphics card for serious business.

Nvidia Tesla K20

The big reveal at this conference from a hardware standpoint definitely is the GK110, which packs an astonishing 7.1 billion transistors on a 28nm process. It also promises to have all the compute features that some were feeling missing from the GK104. Nvidia CEO Jen-Hsun Huang said at a post-keynote Q&A that the GK110 is "the most complex IC commercially available on planet."

7.1 billion transistors in the GK110

In comparison, next in complexity and transistor count is a chip from Xilinx called the Virtex-7 2000T FPGA, which integrates 2 million logic cells and 6.8 billion transistors. To help put that in better perspective, Intel's 10-core Xeon Westmere-EX has 2.6 billion transistors.

The GK110 features 15 SMX units with 192 CUDA cores per unit, which gives a grand total of 2,880 CUDA cores. Nvidia hasn't yet revealed full specifications on the Tesla K20 products yet, but indicated that not all boards will have all 15 SMX units running. Regardless, people can safely expect the use of around at least 2,496 CUDA cores from most Tesla K20 implementations.

The memory bus has been upgraded to 384-bit with six 64-bit controllers in parallel. As for memory capacity itself, Nvidia did not specify. When pushed for an answer, Huang said simply, "Not enough."

To clarify, he added, "As much fast memory as possible behind 384 bits," but no matter what, it will "likely not be enough, because the problems [the K20 is] trying to solve are so huge."

Unfortunately, the GK110 isn't quite finished yet, so we won't be seeing this one until Q4 2012. When it does become available the GK110 GPU is expected to be incorporated into the new Titan supercomputer at the Oak Ridge National Laboratory in Tennessee and the Blue Waters system at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign.

For those who want a Kepler-based Tesla product today, Nvidia also announced was the GK104-based Tesla K10, which is available immediately. This accelerator board features two GK104 Kepler GPUs that deliver an aggregate performance of 4.58 teraflops of peak single-precision floating point and 320 GB per second memory bandwidth.

The Tesla K10 has already found use in the oil and gas industries, as well as signal and image processing.

Nvidia Tesla K10

"Fermi was a major step forward in computing," said Bill Dally, chief scientist and senior vice president of research at Nvidia. "It established GPU-accelerated computing in the top tier of high performance computing and attracted hundreds of thousands of developers to the GPU computing platform. Kepler will be equally disruptive, establishing GPUs broadly into technical computing, due to their ease of use, broad applicability and efficiency."

As Nvidia CEO Jen-Hsun Huang detailed at his keynote, the Kepler-based Tesla cards feature three new innovations that help add to the edge over Fermi. They are:

SMX Streaming Multiprocessor -- The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX's energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic.

-- The basic building block of every GPU, the SMX streaming multiprocessor was redesigned from the ground up for high performance and energy efficiency. It delivers up to three times more performance per watt than the Fermi streaming multiprocessor, making it possible to build a supercomputer that delivers one petaflop of computing performance in just 10 server racks. SMX's energy efficiency was achieved by increasing its number of CUDA architecture cores by four times, while reducing the clock speed of each core, power-gating parts of the GPU when idle and maximizing the GPU area devoted to parallel-processing cores instead of control logic. Dynamic Parallelism -- This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods.

-- This capability enables GPU threads to dynamically spawn new threads, allowing the GPU to adapt dynamically to the data. It greatly simplifies parallel programming, enabling GPU acceleration of a broader set of popular algorithms, such as adaptive mesh refinement, fast multipole methods and multigrid methods. Hyper-Q -- This enables multiple CPU cores to simultaneously use the CUDA architecture cores on a single Kepler GPU. This dramatically increases GPU utilization, slashing CPU idle times and advancing programmability. Hyper-Q is ideal for cluster applications that use MPI.



Read more at our liveblog of the Nvidia GTC keynote, and find out what applications Nvidia has planned for gaming in the cloud with GeForce Grid.