Nvidia's big Pascal GPU code named GP100 will feature a massive 4096bit bus and four HBM2 stacks each up to 8-Hi. The upcoming Nvidia flagship Pascal chip set to debut on TSMC's 16nm FinFET process later next year. We have confirmed with our sources that the GPU will be made with two different variations of stacked HBM2 solutions, however both will feature a massive 4096bit memory interface just like AMD's flagship Fiji GPU launched last month.



The first variation will pack four HBM2 stacks, each will be 4-Hi and will be clocked at 1Ghz. This will go into the traditional consumer GeForce line of GP100 based products. The second variation is also equipped with four HBM2 stacks clocked at 1Ghz, however each will be 8-Hi.

In HBM stacking #-Hi denotes to the number of stacked DRAM dies, however this system does not take into account the additional base die which incorporates logic and memory PHY. So it's only meant to specify how many DRAM dies are in the stack rather than the total number of chips in the stack. GP100 packages with 8-Hi HBM stacks will be limited to professional products, including Quadro and TESLA GPUs, where huge memory capacities are essential.

Nvidia's Big Pascal Slated For Release Next Year

We've heard just recently that Nvidia's flagship Pascal code named GP100 has actually taped out on TSMC's 16nm FinFET manufacturing process last month. Interestingly we've also learned just three days ago that AMD has also taped out two FinFET chips last quarter. It's absolutely not a coincidence that both companies completed their designs at exactly the same time. Both are aggressively pushing for a debut of their next generation FinFET based GPUs next year.

TSMC's new 16nm FinFET process promises to be significantly more power efficient than planar 28nm. It also promises to be a bring a considerable improvement to transistor density. Which would enable Nvidia to build faster, significantly more complex and more power efficient GPUs.

TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

Although notably, unlike Nvidia which has confirmed that its Pascal GPUs will be manufactured using TSMC's 16nm FinFET process, AMD has yet to announced whether the Arctic Islands family of GPUs will be made on TSMC's 16nm or Samsung's 14nm process. Both nodes are very similar, so which process AMD ends up using will be primarily dictated by yields and time-to-market.

Also unlike Nvidia, AMD has a much more powerful incentive to launch its next generation of FinFET GPUs first. This is because the company has priority to HBM2 capacity - which is going to be limited initially - as a result of co-inventing the technology with Hynix. By pushing its graphics products to launch first AMD can establish two competitive advantages over its rival. The first obvious advantage is being first to market by launching its products earlier than its rival. But most importantly this enables AMD to capture much of that initial HBM2 capacity away from Nvidia and extend its time-to-market lead substantially. This could create an interesting market dynamic but whether it can succeed remains to be seen.

Obviously Nvidia realizes that this play is in the cards, no pun intended, and will undoubtedly bide its time wisely honing its chips. GP100 has already been taped out so there's not much that can be done to the chip's floorplan, however Nvidia can still use that extra time on post-silicon work. Nvidia can also spend more time working on its smaller Pascal chips which haven't been taped out as of yet.

Apart from HBM2 and 16nm there is one big compute-centric feature that Nvidia will debut with Pascal. And it's NVLink. Pascal will be the first GPU from the company to support this new proprietary server interconnect.



The technology is aimed at GPU accelerated servers where the cross-chip communication is extremely bandwidth limited and a major system bottleneck. Nvidia states that NV-Link will be up to 5 to 12 times faster than traditional PCIE 3.0 making it a major step forward in platform atomics. Earlier this year Nvidia announced that IBM will be integrating this new interconnect into its upcoming PowerPC server CPUs.

NVIDIA® NVLink™ is a high-bandwidth, energy-efficient interconnect that enables ultra-fast communication between the CPU and GPU, and between GPUs. The technology allows data sharing at rates 5 to 12 times faster than the traditional PCIe Gen3 interconnect, resulting in dramatic speed-ups in application performance and creating a new breed of high-density, flexible servers for accelerated computing.

There's clearly a distinct difference between the goal that Nvidia is trying to reach with Pascal as opposed to Maxwell. With Maxwell it was all about power efficiency no matter the cost and this meant huge sacrifices in compute performance. With Pascal however, Nvidia is shifting its focus back on compute and this is done through a myriad of ways. High Bandwidth Memory is going to play a huge role in providing the necessary bandwidth and memory capacity needed for the HPC and enterprise markets. NV-Link is also going to play a very crucial part in allowing clients to realize their performance targets with Pascal accelerator clusters. Finally mixed precision means even more power efficiency for the mobile workload. So it's clear Pascal is much more than Maxwell and Nvidia is pursuing a more comprehensive multi-faceted approach this time around.