Optical bandwidth steering and adaptable architectures possibly pave way for exascale computing



Picture: Philip Loeper, ISC HPC. Picture: Philip Loeper, ISC HPC. 23 Jul 2018 Frankfurt - The second keynote speaker at ISC'18 in Frankfurt, Germany, was Keren Bergman from the Lightwave Research Lab at Colombia University. In order to respond to the ever increasing demand for more compute power and storage and faster networking she presented the alternative option of photonics. In Keren Bergman's lab experiments are currently being performed in a testbed to investigate how embedded photonics could contribute to the growing needs for higher performance and flexible scalability in HPC architectures.

Keren Bergman first introduced the Summit hosted at Oak Ridge National Laboratory as the most powerful supercomputer in June 2018. Summit has a peak performance of 122.3 PetaFLOPS on Linpack and has performed Data Analytics applications up to 3.3 ExaFLOPs. The power consumption is 13 MW and the power efficiency amounts to 13.9 GFLOPs/Watt, which results in a nr. 5 spot in the Green500. The machine has 4048 nodes with 200 G - Dual-rail Mellanox EDR 100G Infiniband; 9216 IBM Power9 CPUs at 2 per node; and 27,648 Nvidia Volta V100 GPUs at 6 per node.

The next challenge is to reach exascale+. If one takes an average of the top 10 systems, there is a tremendous progression in performance by factor 65 since 2010, stated Keren Bergman. The node bandwidth curve is pretty flat with an increase by factor 4.8. The Byte-per-Flop ratio has decreased by factor 0.08. Many applications can still be accessed though.

Keren Bergman also looked at the performance and the data movement energy budget. She considered the result of GFLOPs/Watt = GFlop/second / Joule/second = GFlop/Joule as extremely good. 14 GFLOPs/W is reached by Summit versus 72 pJ/FLOP.

As for the data movement energy, the access SRAM is good with 10fJ/bit; the access DRAM cell is 1 pJ/bit; the movement to HBM/MCDRAM is 10 pJ/bit; and the movement to DDR3 off-chip is 100 pJ/bit, which is dramatic. All in all, Keren Bergman thought the TOP500 is extremely useful in order to compare all these results.

The Green500 nr. 1, the Shoubou machine was listed at rank 94 in the TOP500 in June 2016. Keren Bergman showed that the machine has come from 6.7 GFlop/W to 9.5 GFlop/W in November 2016 in just 6 months. The energy-efficiency goes up. In 2 years' time, from 2016 to 2018, results moved up from 6.7 to 18.4. This is incredible. There is a significant improvement in two years only.

Keren Bergman briefly touched Nvidia's GPU/memory integration assembly. Here, the memory is closer to GPU. CoWoS is a chip on wafer on substrate. The ZettaScaler architecture has a modular design and is liquid cooled. Architectures show big gains in GFlops/Watt.

The high performance data centres show convergence on Artificial Intelligence. There is a strong interest in energy efficiency of data centres on AI and not only for "small" systems. Training deep neural networks takes time. Keren Bergman referred to Facebook and Nvidia.

She then moved over to her actual topic by asking what photonics can bring to the table. There is a photonic opportunity for data movement. Photonics reduce energy consumption and

eliminate bandwidth taper. Therefor, one has to maximize the data movement benefits of photonics by introducing energy optimized high bandwidth density links; optically connecting MCMs - CPU/GPU/Memory; and adaptive connectivity bandwidth steering. Photonics allow to connect to compute and storage resources that are needed and no more.

Silicon photonics dense-WDM are scalable with >Tb/s/mm, <1pJ/bit, with "any distance" optical interconnect. The architecture involves the following chain: external laser source - coupler loss - modulator array penalty - coupler loss - coupler loss - demux array penalty - sensitivity level of receiver. The link optimization shows a top-to-bottom approach. One has to minimise the losses to bring the laser onto the chip and to optimize every device power consumption.

Keren Bergman explained that the laser has a fully integrated Dense WDM comb source. The SiN micro-resonators generate combs and provoke a multi-wavelength source. The idea is to have compact, low-power comb sources.

The scaling chip 'escapes' bandwidth density, as follows:

18 NVLink 2.0 ports --> 9 per long edge top/bottom

50 GB/s per port (25 GB/s echat Tx/Rx)

1 NVLink about 2 mm of linear edge

50 GB/s per 2 mm --> 200 Gb/s/mm

The dense WDM silicon photonic has a 250 µm fiber pitch; 8 fiber links over 2 mm linear edge; and 64 lambdas per fiber link with each lambda at 16Gb/s.

The ubiquitous optically connected multi-chip module (MCM) provides optical communication among the interposers and the MCM with universal intake.

Keren Bergman stated that the Integrated Photonics Manufacturing Institute's Core Hubs in Albany are showcasing facility. The 200mm tools provide unprecedented quality. The ASIC/silicon photonic interposer integration offers an optically-connected memory architecture with 24 fibers per coupling assembly, equalling 12 cubes, 24 GB capacity, 3 TB/s.

The disaggregation goes deeper into the hierarchy. Keren Bergman asked how the traffic is being matched. There are many topologies possible but the question is: Which ones make sense? Researchers have to observe the design space exploration under loads.

Adaptive, flexible connectivity with bandwidth steering amounts to deep disaggregation. The researchers introduce optical switches in the OC-MCM topology. The conventional architecture is assembled with flexible interconnect. The flexible bandwidth steering is performed with silicon photonic switches. Keren Bergman explained that the silicon photonic switches are used to perform optical circuit switching. Bandwidth is coming from idle or under-utilized links.

In order to adapt the network to HPC traffic, the traffic characteristics of HPC applications are skewed and well-defined. Flexfly is used for optical bandwidth steering in dragonfly. Thus, one can increase the relative provisioned bandwidth to match the traffic. Bandwidth steering enables the "matching" of the connectivity matrix to the traffic matrix, as Keren Bergman showed. As such, good performance can be reached even with a modest solution.

Keren Bergman and her team have developed a prototype 64-node system testbed. There are already some experimental results for the testbed.

Keren Bergman summarized that the data movement is critical to any future performance scaling. Photonics provide system-wide Tb/s per 'wire' and 1 pJ/bit. Ultra bandwidth dense WDM photonic links are deeply disaggregated architectures. The optical connectivity for flexibly assembled interconnectivity topologies might be a solution for the computer architecture landscape that is changing rapidly. Today, we are talking Data Analytics and Artificial Intelligence. Here, optical bandwidth steering and adaptable architectures for scalability can bring efficiency to the computational work. The ultimate energy efficiency is reached by using only the required resources for the needed time period.

Leslie Versweyveld