In this special guest feature from Scientific Computing World, Tom Wilkie looks at the next generation of supercomputers coming to China.

Within the next 12 months, China expects to be operating not one but two 100 Petaflop computers, each containing (different) Chinese-made processors, and both coming online about a year before the United States’ 100 Petaflop machines being developed under the Coral initiative.

Ironically, the CPU for one machine appears very similar to a technology abandoned by the USA in 2007, and the US Government, through its export embargo, has encouraged China to develop its own accelerator for the other machine.

As reported on the Scientific Computing World website, in a separate move to acquire mastery of microprocessor technologies, China’s state owned Tsinghua Unigroup has made a bid to acquire US semiconductor manufacturer Micron Technology for $23 billion, in what could be one of the biggest acquisitions of a US company by a Chinese firm.

In a further assertion of technological ambition, China’s domestic HPC vendors were highly visible at the ISC High Performance Conference in Frankfurt in July where they talked to Scientific Computing World about their plans to expand overseas.

Highlighting the role reversal between China and the USA, the Chinese computer company Inspur intends to open a US manufacturing plant this year. It is a contrast to the recent past, where once it was US companies who outsourced manufacturing to China. Inspur already has an R&D centre in San Jose, California.

Meantime, another Chinese supercomputing company, Sugon, is planning a trade mission to the UK in September of this year to find partners for its overseas expansion. It too has plans to invest overseas, to develop expertise in, and familiarity with, its systems and products.

Huawei was also exhibiting at ISC High Performance. It has, of course, long had a dominant position in the telecommunications business, but it has been expanding into cloud servers and more recently into HPC. Its cluster for the Alibaba Group in China ranks 115 in the current Top500. Now it too is seeking partnerships with distributors and integrators in Europe and North America.

Also with a strong presence at the show was Lenovo, which originally manufactured PCs, tablets, and mobile phones, and which acquired IBM’s x86 server business last year, including not only the commercial computing but also the high-performance computing side. Lenovo marked that it had got serious about HPC when it opened a European technology centre in Stuttgart earlier this year. The company sells its very high-end systems direct, but also has partnerships with integrators in Europe and around the world.

Chinese RISC processor

Information in the public domain at the ISC High Performance Conference in Frankfurt in July suggested that China is developing a 100 Petaflop machine that will use its own CPU, designed in China. The computer is expected to start operating before the middle of next year.

Hitherto, international attention has focused on the Tianhe-2 computer developed by the National University of Defence Technology (NUDT) and sited at the National Supercomputer Centre in Guangzhou, largely because it retained its position as the world’s No. 1 system for the fifth consecutive time when the most recent Top500 list of the world’s fastest supercomputers was announced in the middle of July.

However, the first Chinese machine to reach 100 Petaflops may be one being developed by the Jiangnan Institute of Computer Technology in Wuxi, near Shanghai. It will use a next-generation, Chinese designed and manufactured, ShenWei chip. A ShenWei processor, the SW1600, currently powers the Sunway BlueLight, which is already in operation at the National Supercomputer Centre in Jinan, and which ranked 86 in the Top500 published in July.

ShenWei is a RISC and not an x86 processor, so it requires its own instruction set. Both system and application software will have to be customized for it, thus making programming and use of the machine more complex.

False dawn

There is some suggestion that the next-generation ShenWei could come online as early as the end of this year, or towards the beginning of next year. However, there have been reports in the past that China would deploy its own processors in supercomputers. In particular the Dawning 6000 was supposed to incorporate the Loongson processor as its core (also sometimes called the Godson processor). This proved to be a false dawn: in the event, the Nebulae machine as it has come to be called, the second fastest in the world in June 2010, was a Dawning TC3600 Blade, incorporating Intel X5650 processors and Nvidia Tesla GPUs. Loongson appears no longer to be being pursued as a processor for HPC.

If ShenWei processors are used in a 100 Petaflop machine within the next year, there will be an element of historical irony, for the design of the chip appears to resemble very closely that of the ‘Alpha’ RISC chip developed by the Digital Equipment Corporation (DEC) and discontinued in 2007 by HP, which had inherited the technology through merger and acquisition. [Editor’s note: Publicly available details are sketchy and some statements may be revised as more information comes to light.]

The China Accelerator

The second domestically designed chip will be the ‘China Accelerator’ that the National University of Defence Technology (NUDT) is developing for the Tianhe-2 supercomputer. As a result of the US embargo on Intel exporting any more Phi co-processors to the NUDT, the upgrade to the Tianhe-2 that will take it to 100 PFlops has been delayed until later in 2016. However, the effect has been to encourage the domestic development of Chinese co-processors.

The interconnection topology for the Tianhe system is an optic-electronic hybrid. The NUDT had already created the interconnects, using high-radix Network Routing Chips (NRC) and high-speed Network Interface Chips (NIC), both of which were designed by Chinese engineers and are Chinese intellectual property.

Since the next-generation ShenWei will be based on its own CPU chip and the co-processors for Tianhe-2 are being developed at the NUDT, China will enter the 100 Petaflop era with its own CPU, accelerator, and interconnect technologies.

More CPUs envisaged in the Five Year Plan?

These developments are being conducted under the terms of the State High-Tech Development Plan, known as the 863 program, which is funded and administered by the Chinese Government to stimulate the development of advanced technologies in a wide range of fields for the purpose of rendering China independent of financial obligations for foreign technologies. The NUDT is also designated, as is Jiangnan University, under the 211 program, which is intended to facilitate the development of Chinese higher education.

The 100 Petaflop version of the Tianhe-2 will have Intel E5 processors and the China Accelerators. However, it is clear that the Chinese intend any successor machine will have its own, domestically produced CPU, rather than the Intel processors, as well as Chinese-made accelerators.

It would make sense for China to develop both a RISC and an x86 processor. However, it is not yet clear whether the NUDT, the home of x86 supercomputing at present, will develop a second line of processors in addition to the ShenWei. Ultimately, the developments will take place under the auspices of China’s 13th national Five Year Plan, which will cover the period from 2016 to 2020. Proposals on the future of supercomputing in China are being put forward for the first draft of the plan, which will be drawn up in October this year. The final plan will be submitted for approval to the National People’s Congress meeting in March 2016. Only when it is published after the meeting are we likely to have clarity.

Commercial companies stick with x86, for now

The independent development of Chinese microprocessors is a national strategic priority for the Government, but for the country’s commercial supercomputer manufacturers, the priority is to expand both in China and overseas. They are content, therefore to use non-Chinese processor chips (and GPUs) for the moment.

In an interview with Scientific Computing World at the ISC High Performance Conference in Frankfurt in July, Lei Wang vice president of Sugon (Dawning Information Industry) said that the processors being developed for high-performance computing within China ‘are used in national strategic projects. In the long run, our own chips will focus on certain markets. Companies have invested a lot in semiconductor chips for mobile applications but we focus on Intel and AMD for the commercial side.’ He said he did not expect an imminent breakthrough of Chinese-designed processors into the commercial marketplace for servers.

His thoughts were echoed in a separate interview with Xiaoyang Niu, product manager of Inspur’s server product department. Niu also observed that there are already chipset manufacturers in China, mainly for mobile and embedded applications. Although he conceded that some manufacturers may wish to use domestically produced chips in their servers, like Sugon’s Wang he did not expect an imminent deployment in the commercial server space: ‘We do not have a clear road map for that,’ he said.

Supercomputing centres in China

The next-generation ShenWei machine is being developed at the Jiangnan Institute of Computer Technology which is located at the Wuxi Taihu New City Science and Education Industrial Park (K-PARK). This was established in April 2006 as a hi-tech industrial incubator and in addition to the Institute, it hosts companies and centres specialising in software and service outsourcing, television animation, micro-nano sensor, creative design, network economy, high-end R&D and educational training.

Wuxi is one of six National Supercomputing Centres in China:

Guangzhou (formerly known as Canton), the site of Tianhe‐2;

Wuxi, not far from Shanghai, which will be the location of the next-generation ShenWei;

Changsha, the capital of Hunan Province in south-central China, which hosts a Tianhe‐1A machine;

Tianjin, one of China’s largest cities, sited near the coast to the south-east of Beijing, which also hosts a Tianhe‐1A machine;

Jinan, the capital of Shandong province in Eastern China, south of Tianjin and south-east of Beijing, where the current ShenWei Bluelight is located; and

Shenzhen, in Guangdong Province just north of Hong Kong, where Nebulae, the Dawning TC3600 Blade System (also known as the Dawning‐6000) operates.

US embargo on four machines and one university

The US Government has banned Intel (or any other US company) from exporting to the three National Supercomputing Centers in Changsha, Guangzhou, and Tianjin, together with the National University of Defence Technology (NUDT). All four have been placed on the US ‘Entity list’, meaning that the centers and the university have been ‘determined by the US Government to be acting contrary to the national security or foreign policy interests of the United States.’ The US Government statement goes on to say that ‘The Tianhe–1A and Tianhe–2 supercomputers are believed to be used in nuclear explosive activities.’ This has been denied by Professor Yutong Lu, deputy chief designer of the Tianhe project in an interview with Scientific Computing World.

The National University of Defence Technology (NUDT) is located in Changsha, the capital of Hunan Province. Its computer department, the first in China, was established in 1966 and expanded and renamed as the School of Computer Science in 1999. Now with more than 400 staff, it created the first supercomputer in China, in 1983; the first GFlops machine in 1992; and the first TFlops system in 2000, as well as the Tianhe-1 and Tianhe-2 systems.

The Changsha National Supercomputing Centre is located on the campus of Hunan University and was a joint venture by the Provincial Government, Hunan University and the NUDT. Construction of the new building started in November 2010 at a cost of around $112 million (700 million Yuan) and it began operations officially on 4 November 2014, at a total estimated cost of 860 million Yuan ($140.65 million). It operates a Tianhe-1A (with Nvidia GPUs as the accelerator) with a peak performance of 1.34 PFlops.

The Tianjin National Supercomputing Centre predates the one at Changsha, having been approved by the Ministry of Science and Technology in May 2009, and constructed through a collaboration of Tianjin Binhai New Area (TBNA) and the NUDT. It hosts China’s first Petaflops computer, the Tianhe-1 (also with Nvidia GPUs as accelerators) which now has a peak performance of 4.7 Petaflops and now claims to be providing high-performance computing services for more than 300 customers. The Tianjin Binhai New Area is a Special Economic Zone and is intended to replicate development seen in Shenzhen and Pudong in Shanghai. Located on the coast of the Bohai Sea, the innermost gulf of the Yellow Sea, and to the east of Tianjin’s main urban area, there are 11 large cities within a 500-kilometer radius, each with a population of more than one million people.

The National Supercomputer Centre in Guangzhou is located on the East Campus of Sun Yat-sen University in the Higher Education Mega Centre, Guangzhou, and is jointly sponsored by Guangdong Province, Guangzhou city, Sun Yat-sen University, and the National University of Defense Technology, with a total capital investment of US$400 million. The project started in November 2011, and in October 2013, the first phase of Tianhe-2 supercomputer was relocated from Changsha to Guangzhou. An upgrade programme started in July 2014. In addition to being the site of the world’s fastest computer, the Tianhe-2, the centre is also host to a Tianhe-1A machine which, at 211.7 TFlops sustained performance, is currently ranked 359 on the Top500.

The fifth Tianhe system

In fact, there are five Tianhe systems listed in the current Top500. As just described, two are in Guangzhou, and one each in Tianjin and Hunan. The fifth is in the Cloud Computing Centre in LivLiang (also sometimes written as Lüliang or Lyuliang) a city of nearly four million inhabitants in the west of Shanxi province, some 500 km south west of Beijing. This ranks number 30 in the overall list, with a sustained performance of around 2 PFlops. It is powered by Intel Xeon E5 processors and Xeon Phi co-processors.

This story appears here as part of a cross-publishing agreement with Scientific Computing World.

Sign up for our insideHPC Newsletter.