Nvidia’s Pascal GPUs are already among the most anticipated upcoming 2016 products from the company, and today's news may very well build that anticipation even more . As it happens today we've learned that the company already has four different Pascal graphics cards going through testing and validation work.

It seems the company is putting no time to waste getting it's GP100 and GP104 Pascal GPUs out the door . The same GPUs that we've spotted on more than one occasion in the past. GP100 is the flagship Pascal GPU and the inevitable successor to Nvidia’s GTX Titan X has been spotted in transit. Going from TSMC’s fabrication plants to Nvidia’s testing facilities in India. Pascal is the code name for Nvidia’s upcoming GPU architecture scheduled for second half, 2016 market release. The GP100 GPU is the largest and most powerful of Nvidia’s Pascal graphics chips.

Four Nvidia Pascal Graphics Cards Spotted In The Wild

All four Nvidia graphics boards in question are described as "COMPUTER GRAPHICS CARDS". However, all four carry very similar per unit values and as such we could be looking only at the circuit boards and not necessarily graphics cards but there's really no way of telling for sure.

All four boards start with the same 699 serial number and the earliest record of a board carrying that serial number appears in December. So we know that we're looking at Nvidia graphics boards that are new and did not exist at any point before December. This could potentially explain Pascal's absence from CES and why Nvidia chose to showcase the Pascal Drive PX2 module with Maxwell GPUs instead.

There are four different boards here with the following serial numbers :

699-2H403-0201-500

699-1G411-0000-000

699-1H400-0000-100

699-12914-0071-100

The 1H400, 1G411 and 2H403 units are all derivatives/variants of the same basic board , while the 12914 board is distinctly different. So what we have here looks very much like three evolutionary iterations with the first three boards. This is especially likely because there's no overlapping between the three. One board shows up and it's followed by another with no recurrence of the previous board.

There's no way of knowing for certain whether these are GP100 or GP104 boards as of yet. Interestingly GP100 or “Big Pascal” as we’d like to call it has been spotted a few months back. Back then Nvidia only had GPUs but there was no evidence of any actual boards. So looks like Pascal has come a long way since then.



What we know so far about Nvidia's flagship Pascal GP100 GPU :

Pascal graphics architecture.

2x performance per watt estimated improvement over Maxwell.

To launch in 2016, purportedly the second half of the year.

DirectX 12 feature level 12_1 or higher.

Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.

Built on the 16nm FinFET manufacturing process from TSMC.

Allegedly has a total of 17 billion transistors, more than twice that of GM200.

Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM and 8-Hi stacks for up to 32GB for the professional compute SKUs.

Features a 4096-bit memory bus interface, same as AMD's Fiji GPU power the Fury series.

Features NVLink (only compatible with next generation IBM PowerPC server processors)

Supports half precision FP16 compute at twice the rate of full precision FP32.

GPU Architecture NVIDIA Fermi NVIDIA Kepler NVIDIA Maxwell NVIDIA Pascal GPU Process 40nm 28nm 28nm 16nm (TSMC FinFET) Flagship Chip GF110 GK210 GM200 GP100 GPU Design SM (Streaming Multiprocessor) SMX (Streaming Multiprocessor) SMM (Streaming Multiprocessor Maxwell) SMP (Streaming Multiprocessor Pascal) Maximum Transistors 3.00 Billion 7.08 Billion 8.00 Billion 15.3 Billion Maximum Die Size 520mm2 561mm2 601mm2 610mm2 Stream Processors Per Compute Unit 32 SPs 192 SPs 128 SPs 64 SPs Maximum CUDA Cores 512 CCs (16 CUs) 2880 CCs (15 CUs) 3072 CCs (24 CUs) 3840 CCs (60 CUs) FP32 Compute 1.33 TFLOPs(Tesla) 5.10 TFLOPs (Tesla) 6.10 TFLOPs (Tesla) ~12 TFLOPs (Tesla) FP64 Compute 0.66 TFLOPs (Tesla) 1.43 TFLOPs (Tesla) 0.20 TFLOPs (Tesla) ~6 TFLOPs(Tesla) Maximum VRAM 1.5 GB GDDR5 6 GB GDDR5 12 GB GDDR5 16 / 32 GB HBM2 Maximum Bandwidth 192 GB/s 336 GB/s 336 GB/s 720 GB/s - 1 TB/s Maximum TDP 244W 250W 250W 300W Launch Year 2010 (GTX 580) 2014 (GTX Titan Black) 2015 (GTX Titan X) 2016

We've learned last year that Nvidia’s flagship Pascal code named GP100 may have taped out on TSMC’s 16nm FinFET manufacturing process in June. Interestingly just shortly afterwards AMD announced that it had taped out two FinFET chips. It’s absolutely not a coincidence that both companies completed their FinFET designs at the same time. Both are pushing for a very aggressive time to market timetable to debut their next generation FinFET based GPUs this year.

Nvidia Pascal - 2X Perf/Watt, Stacked Memory, NV-Link And Mixed Precision Compute

TSMC’s new 16nm FinFET process promises to be significantly more power efficient than planar 28nm. It also promises to bring about a considerable improvement in transistor density. Which would enable Nvidia to build faster, significantly more complex and more power efficient GPUs.

TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

Apart from HBM2 and 16nm there is one big compute-centric feature that Nvidia will debut with Pascal. And it’s NVLink. Pascal will be the first GPU from the company to support this new proprietary server interconnect.

NVIDIA Volta GPUs and IBM Power9 CPUs Enabled Supercomputers in 2017: The technology targets GPU accelerated servers where the cross-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs. NVLink will debut with Nvidia’s Pascal in 2016 before it makes its way to Volta in 2018.

NVLink is an energy-efficient, high-bandwidth communications channel that uses up to three times less energy to move data on the node at speeds 5-12 times conventional PCIe Gen3 x16. First available in the NVIDIA Pascal GPU architecture, NVLink enables fast communication between the CPU and the GPU, or between multiple GPUs. Figure 3: NVLink is a key building block in the compute node of Summit and Sierra supercomputers. VOLTA GPU Featuring NVLINK and Stacked Memory NVLINK GPU high speed interconnect 80-200 GB/s 3D Stacked Memory 4x Higher Bandwidth (~1 TB/s) 3x Larger Capacity 4x More Energy Efficient per bit. NVLink is a key technology in Summit’s and Sierra’s server node architecture, enabling IBM POWER CPUs and NVIDIA GPUs to access each other’s memory fast and seamlessly. From a programmer’s perspective, NVLink erases the visible distinctions of data separately attached to the CPU and the GPU by “merging” the memory systems of the CPU and the GPU with a high-speed interconnect. Because both CPU and GPU have their own memory controllers, the underlying memory systems can be optimized differently (the GPU’s for bandwidth, the CPU’s for latency) while still presenting as a unified memory system to both processors. NVLink offers two distinct benefits for HPC customers. First, it delivers improved application performance, simply by virtue of greatly increased bandwidth between elements of the node. Second, NVLink with Unified Memory technology allows developers to write code much more seamlessly and still achieve high performance. via NVIDIA News

Unlike Maxwell, Nvidia has laid major focus on compute and GPGPU acceleration with Pascal. The slew of new features and technologies that Nvidia will debut with Pascal emphasize this focus. Including the use of next generation stacked High Bandwidth Memory, high-speed NVLink GPU interconnect and the addition of mixed precision compute at double the rate of full precision compute to push perf/watt. We can’t wait to see Pascal in action later this year, but until then stay tuned for the latest.

