By: Michael Feldman

The number two-ranked Tianhe-2 supercomputer, installed at the National Super Computer Center in Guangzhou, is being upgraded to 94.97 petaflops, nearly doubling its current peak performance of 54.9 petaflops.

The news comes out of the International HPC Forum (IHPCF), via a series of tweets from Satoshi Matsuoka posted on Tuesday. During the morning session, it was revealed that the upgraded system, dubbed Tianhe-2A, will sport the new Chinese-made Matrix-2000 GPDSP accelerators. They will replace the existing Intel Knights Corner Xeon Phi coprocessors that were installed in the Tianhe-2 back in 2013.

The original plan was to upgrade the system with the newer Knights Landing devices. But after the US government instituted an embargo on these chips to certain Chinese supercomputing sites, including the Guangzhou center, the National University of Defense Technology (NUDT) had to come up with plan B. In this case, that meant developing their own coprocessor. That turned out be the Matrix-2000, a DSP-type chip, tweaked for more general-purpose computation.

According to slides presented at the forum, each Matrix-2000 will deliver 2.4576 teraflops (peak), which more than doubles the 1.0 teraflops delivered by the original Xeon Phi chip. The Matrix-2000 consists of 128 cores, each one providing 16 double precision flops per cycle. Those flops are delivered by a 256-bit vector unit, which as Satoshi notes, is in line with the Knights Corner chip it replaces.

At least for the time being, the system will retain the original host CPUs from Tianhe-2, which are Intel Xeon processors. Each supercomputer node will pair two of those Intel CPUs with two Matrix-2000 coprocessors, hooked in via PCIe. The node count is being increased from 16,000 to 17,792.

Other enhancements include an interconnect that is 40 percent faster interconnect (to 14 Gbps) and has 50 percent lower latency (1 us). This is likely the TH-Express-2+ that NUDT has talked about before. In addition, main memory has been bumped from 1.4 to 3.4 petabytes, slightly improving the bytes-to-flops ratio of the Tianhe-2. Storage has also been enhanced in both capacity and I/O bandwidth. All the particulars are below, courtesy of James Lin, who tweeted some nice screen images from the presentation.

Source: James Lin,‏ @jameslinsjtu

Even though peak performance is going to nearly double, the system’s total power draw of 18 MW is just slightly more than that of the original system. That gives it a power efficiency of more than 5 gigaflops per watt, which would place it somewhere around the number 20 slot on the Green500 list.

Ironically, the upgrade won’t improve the system’s position in the TOP500 rankings. The number one Sunway TaihuLight has a peak performance of 125.4 petaflops, and attains 93 petaflops on the High Performance Linpack (HPL) benchmark. It’s unlikely Tianhe-2A will come in at better than 70 or 80 petaflops on HPL.

Nevertheless, the upgrade further cements China’s status as a serious supercomputing power, and does so, once again, with domestically produced technology. The country is currently the odds-on favorite to stand up the first exascale system, which it intends to do in the 2019-2020 timeframe.