Xilinx unveiled its Versal Premium lineup of Adaptive Compute Acceleration Platforms (ACAPs) that wield a new take on FPGA design, building on its Versal AI Core and Prime series. Last year, Xilinx started sampling the Prime and AI Core series, two of the six Versal product lines that it has planned, and today it is unveiling more details about the higher-end Premium series, although it won’t start sampling until 2021.

Xilinx designed the Versal Premium series for high bandwidth networks in space- and thermally-constrained environments. The company claims the Premium ACAPs offer three times the throughput and twice the compute density of competing solutions. Xilinx says the Versal Premium delivers the equivalent networking logic density of 22 16nm FPGAs.

Image 1 of 5 (Image credit: Xilinx) Image 2 of 5 (Image credit: Xilinx) Image 3 of 5 (Image credit: Xilinx) Image 4 of 5 (Image credit: Xilinx) Image 5 of 5 (Image credit: Xilinx)

We covered Xilinx’ ACAP concept in more detail last year, but as a refresher, Xilinx introduced this term because a modern FPGA contains much more than such the programmable FPGA fabric. Xilinx has divided its Versal compute engines in three categories: scalar engines, adaptable engines and intelligent engines. There is, of course, also a multitude of I/O hardware and interfaces (such as DDR4 and PCIe 4.0), and it is all connected via a programmable network-on-chip (NoC). This makes the ACAPs adaptable to a diverse range of workloads.

Density comes courtesy of TSMC's 7nm process node paired with a modular design that incorporates a dual-core ARM Cortex-A72 application processor and dual-core ARM Cortex-RF5 real-time processor for scalar operations, along with a PCIe Gen5 controller that supports both CCIX and CXL protocols. The chip also wields a DDR4 controller, (up to) 112G PAM4 transceivers, 600G Ethernet cores (up to 5Tb/s), and 400G crypto engines (up to 1.6Tb/s). Xilinx says the crypto engines make the Premium lineup the only adaptable platform with hardened 400G crypto support for support for AES-GCM-256/128, MACsec, and IPsec.



The Premium series also comes with Interlaken connectivity, which is an industry-standard chip-to-chip interface that's used for switches and routers. A lightweight protocol runs across the connection, which supports different transfer rates and widths.

Xilinx ties these components to up to 14,000 DSP slices and 3.4 million LUTs. Overall, the chips, which comes in seven flavors, come with up to 7.4 million logic cells. Xilinx also uses other features, such as PCIe, Ethernet, Interlaken, and crypto engine features, to bifurcate the stack.

Image 1 of 14 (Image credit: Xilinx) Image 2 of 14 (Image credit: Xilinx) Image 3 of 14 (Image credit: Xilinx) Image 4 of 14 (Image credit: Xilinx) Image 5 of 14 (Image credit: Xilinx) Image 6 of 14 (Image credit: Xilinx) Image 7 of 14 (Image credit: Xilinx) Image 8 of 14 (Image credit: Xilinx) Image 9 of 14 (Image credit: Xilinx) Image 10 of 14 (Image credit: Xilinx) Image 11 of 14 (Image credit: Xilinx) Image 12 of 14 (Image credit: Xilinx) Image 13 of 14 (Image credit: Xilinx) Image 14 of 14 (Image credit: Xilinx)

Tying these features together requires speedy network on chip (NOC) performance, and Xilinx's 2.2 Tb/s interface fits the bill. This programmable NOC supports a variety of link widths and speeds, QoS levels, and multiple arbitration points.

In total, Xilinx says that the equivalent networking logic density of the Ethernet, Interlaken, and Crypto cores is equivalent to 22 16nm FPGAs. There is also an “integrated shell” that allows the ACAP to use zero logic elements for networking infrastructure.

The software is also an important component of the ACAPs. On top of the low-level Vivado, Xilinx has a higher-level Vitis development kit with accelerator library, and can be programmed with C, C++ and Python, catering to software developers. Thirdly, for data scientists, Xilinx supports the major AI frameworks such as TensorFlow.

Performance

Xilinx has some major performance claims for its Premium series. The ACAP has up to 1Gb of tightly coupled memory, and an on-chip bandwidth of 123TB/s, which is almost 10x higher than Nvidia’s Tesla V100 with 14TB/s, according to Xilinx.

Combined with its heterogeneous engines, Xilinx says this delivers breakthrough performance in multiple workloads: 1.6x inference throughput compared to the V100 and 4.6x higher object detection performance, while beating Cascade Lake by 65x in anomaly detection.

Thoughts

The Versal Premium series seems a step above any of Xilinx’ 16nm FPGAs with 3x higher bandwidth and 2x higher compute density. It is upgraded to 112G PAM4 transceivers for 9Tbps bandwidth, 5Tbps Ethernet throughput and 1.6Tbps line-rate encryption via hardened crypto engines. It also has PCIe 5.0, CCIX and CXL support and multi-hundred gigabit Ethernet and Interlaken connectivity.

This makes it a competitor to the Agilex I-series of FPGAs, which will also bring support for PCIe 5.0, CXL and 112G transceivers in 2021. In terms of compute and programmable logic density, is will likely be a step below the Premium series, as it only has 2.7 million logic elements, for example.

This is because Xilinx and Intel has taken a different approach. Xilinx is building out its Versal portfolio with a multitude of dies, while Intel has opted for one base die and proliferating this series with a diverse chiplet ecosystem. Intel has hinted that it is at a “phase 2” of Agilex using Foveros to stack multiple dies, but hasn’t made any announcements on that front yet.

Xilinx says the Versal Premium series will sample to early customers in the first half of 2021.