Kicking off today is the annual International Conference for High Performance Computing, Networking, Storage, and Analysis, better known as SC. For NVIDIA, next to their annual GPU Technology Conference, SC is their second biggest GPU compute conference, and is typically the venue for NVIDIA’s summer/fall announcements. And with a number of announcements in stow NVIDIA has split up their major announcements over two weeks. Last week we saw CUDA 6, which introduced unified memory support for compute workloads for NVIDIA’s products, and today we’ll be seeing a couple of other things, starting with Tesla K40.

With both the GeForce and Quadro lineups getting the full GK110 treatment in the last couple of months with GeForce GTX 780 Ti and Quadro K6000 respectively, it was only a matter of time until NVIDIA gave the Tesla lineup the same treatment. Tesla K20(X) was of course the first product to launch with NVIDIA’s flagship GK110 GPU, and now with K40 the Tesla lineup will become the final product line to be upgraded to full GK110 specifcations.

NVIDIA Tesla Family Specification Comparison Tesla K40 Tesla K20X Tesla K20 Tesla M2090 Stream Processors 2880 2688 2496 512 Core Clock 745MHz 732MHz 706MHz 650MHz Boost Clock(s) 810MHz, 875MHz N/A N/A N/A Shader Clock N/A N/A N/A 1300MHz Memory Clock 6GHz GDDR5 5.2GHz GDDR5 5.2GHz GDDR5 3.7GHz GDDR5 Memory Bus Width 384-bit 384-bit 320-bit 384-bit VRAM 12GB 6GB 5GB 6GB Single Precision 4.29 TFLOPS 3.95 TFLOPS 3.52 TFLOPS 1.33 TFLOPS Double Precision 1.43 TFLOPS (1/3) 1.31 TFLOPS (1/3) 1.17 TFLOPS (1/3) 655 GFLOPS (1/2) Transistor Count 7.1B 7.1B 7.1B 3B TDP 235W 235W 225W 250W Cooling Active/Passive Passive Active/Passive N/A Manufacturing Process TSMC 28nm TSMC 28nm TSMC 28nm TSMC 40nm Architecture Kepler Kepler Kepler Fermi Launch Price $5499? ~$3799 ~$3299 N/A

Like the other fully enabled GK110 cards we’ve seen, Tesla K40 is a moderate spec bump that sees NVIDIA enabling the 15th and final SMX already present on GK110, while also giving the GPU and memory clockspeeds a bump. Compared to the K20X, the K40 gets a very slight GPU clockspeed increase from 732MHz to 745Mhz (2%), which coupled with the additional SMX gives it around 9% more compute throughput on paper. This will bring it to a total of 4.29TFLOPS single precision, or 1.43TFLOPS double precision. Meanwhile for memory performance the memory clockspeed has seen a more significant bump, going from 5.2GHz to a full 6GHz (15% more), or in terms of raw bandwidth from 250GB/sec to 288GB/sec.

Perhaps more significantly however, with K40 the Tesla family finally gets access to 4Gbit GDDR5 modules, which have only recently reached mass production. With Tesla K20X previously topping out at 6GB (24 x 2Gbit), NVIDIA is taking K40 to 12GB (24 x 4Gbit), making it the first Tesla card with that much memory. Like Quadro K6000, which is also launching with 12GB of VRAM, NVIDIA has a number of customers in the compute market who are bottlenecked at either the algorithmic or data set levels by memory capacity. So the additional capacity should offer a welcome improvement for those users, and unlock at least a few more workloads that couldn’t properly fit inside of 6GB.

As for power and cooling, the requirements there will not be changing. K40 needs to be drop-in compatible with K20X, so the TDP remains at 235W; NVIDIA reaping the benefits of binning and the new B1 stepping of GK110, but not having the headroom for a significant GPU clockspeed increase. Taking a step beyond K20X however, K40 will be offered in both passive and active cooling configurations – K20X was only offered in passive – so unlike K20X, K40 can be dropped in a wider array of systems than just rackmount servers and other devices with dedicated expansion slot cooling.

But with that said, despite the fact that K40 is just an iteration on K20 and a member of the Kepler Tesla family (as opposed to being a new product line of its own), K40 does come with one new trick that the K20 cards did not: GPU Boost. To be clear here this isn’t the same GPU Boost we saw in NVIDIA’s Kepler GeForce cards – for one thing it’s not automatic – but it is similar. Since these cards are both TDP limited and all of the cards in a cluster need to operate at the same clockspeed to maintain synchronization, NVIDIA cannot ship K40 at a higher clockspeed than it’s going to be able to sustain. However that doesn’t mean the GK110 GPU underlying K40 can’t clock higher (we’ve seen it in GTX 780 Ti) so NVIDIA has split the difference and will be offering selectable clockspeeds under the GPU Boost moniker.

Besides the 745MHz default clockspeed, K40 cards will also be able to be set at 810MHz and 875MHz, significant clockspeed bumps that would have equally significant performance impacts. These higher clockspeeds are operator selected, and are primarily intended to be used in systems where the workload wasn’t maxing out K40’s 235W TDP in the first place, giving operators the ability to squeeze out a bit more performance by bringing K40 closer to its TDP limits. These higher clockspeeds don’t change the TDP limit itself, and in all likelihood come with very significant power consumption increases (due to the squared impact of voltage), so it will be up to operators to profile their workloads and select a suitable clockspeed, least they cause their cards to throttle and potentially lose sync. Ultimately in cases where these higher clockspeeds can be used, the 17% clockspeed increase from using 875MHz would compound with K40’s earlier 9% performance increase over K20X and put K40 at upwards of 28% faster than K20X.

On a further note, there is one last feature upgrade that is new for K40. For K20(X) NVIDIA limited those cards to PCI Express 2.0 speeds, despite the fact that the underlying hardware was designed to support PCI Express 3.0. For K40 however NVIDIA is finally enabling full PCI Express 3.0 speeds, which would coincide the launch of Intel’s Ivy Bridge-E hardware and the fixed compatibility between the two platforms. For the relevant systems this offers to double the available bandwidth between individual Tesla cards and between cards and the host CPUs – going from 8GB/sec to 15.75GB/sec – something that relative to the high-speed local memory was at times a massive bottleneck for these cards.

Wrapping things up, K40 will be a hard launch from NVIDIA and their partners, with individual cards and OEM systems equipped with them expected to be available today. We’re already seeing some individual cards on sale a few hours before the official launch, placing them at $5,500, though it should be noted that these are retail prices and NVIDIA does not have public MSRPs for Tesla cards.