» Current | 2020 | 2019 | 2018 | Subscribe

Linley Newsletter

Baidu Debuts First AI Accelerator

September 15, 2020

Author: Aakash Jani

Joining other cloud-service providers (CSPs), Baidu is taking a shot at AI mastery by offering the Kunlun K200 SoC. The Chinese company designed the 256-TOPS accelerator for its internal deep-learning workloads. The K200 accelerates common neural-network and SQL operations. On AI inference benchmarks, it matches the power efficiency of Nvidia’s T4 card.

At the recent Hot Chips conference, Baidu revealed that Kunlun features eight clusters combining in-house CPU cores and programmable neural-network engines. The chip’s 2.5D packaging supports two stacks of 8GB High Bandwidth Memory (HBM), providing 256GB/s of bandwidth each. Manufactured in Samsung’s 14nm process, the K200 has a moderate power envelope of 150W and connects to the host processor through a x8 PCIe Gen4 interface.

Baidu joins several other CSPs in developing its own AI chips. Google was the first and is now on its fourth-generation TPU chip. Alibaba has posted industry-leading benchmark results for its Hanguang accelerator, and Huawei has developed the Ascend family. Amazon offers the Inferentia chip as part of its leading cloud service, and Microsoft has the FPGA-based Project Brainwave. These companies are attempting to reduce their total cost of data-center ownership by designing accelerators that augment their infrastructure.

For a first attempt, Baidu produced a good chip that matches the power efficiency of Nvidia’s T4 card. Later this year, we expect the latter company to release a T4 follow-on implementing its new Ampere architecture, which should deliver 2x more power efficiency thanks to architectural and manufacturing-technology advancements. Thus, the K200 will fall short of Nvidia’s efficiency and performance. For this reason, we don’t expect significant use of Kunlun in Baidu’s internal production servers.