Sure, scientists are looking forward to the radio telescope to solve some of the biggest questions in the field of astronomy. But computer science experts take note: The worldwide science project is set to push the limits of high-performance computing and management of huge data sets.

The largest scientific project in history, a radio telescope array of vast proportions and computing demands, is currently being built in the South African Karoo and in the Australian outback

The Square Kilometre Array (SKA) will consist of a square kilometer of radio telescopes arranged in a “spiral arm” in order to create usable pictures of under-examined sectors of space. The array will permit scientists to look far back in time (at a lower resolution) and at nearer time (at a much higher resolution).

To make it work, the SKA needs to process information coming in from 350 dishes and 256,000 dipole antennae on two continents. It will coordinate with a large radio telescope array in the Australian Outback, with additional elements in the U.K. and, eventually, eight other African nations. Australia will host the SKA’s data-dense low frequency aperture array antennas while South Africa will host the medium- and high-frequency arrays.

The SKA will require computing power greater than the fastest computer currently in existence. The changes it will inspire in high-performance computing (HPC) are expected to alter the computing landscape for generations.

To process the data coming in from these telescopes into usable pictures of the universe requires each receiver’s signals to be reconciled. To do this, the SKA computing systems require clock stabilities of the order of picoseconds (10 to the negative 12 ) and processing power as high as 100 petaFLOPS. For context: The most powerful computer in the world today is China’s Sunway TaihuLight, which has been timed at 93 petaFLOPS. The SKA will fill a laptop’s hard drive every few seconds.

But the SKA group faces additional challenges beyond the need for more performance. The group also has to determine how to share that data with academics around the globe.

The requirements

“An artificial benchmark like that of the TaihuLight doesn’t capture typical HPC workloads,” says Miles Deegan, HPC specialist for the SKA. “Ours is data-intensive, not computationally intensive. Among the SKA computing needs are intensive memory bandwidth, I/O, specialized data management, buffering, and batch processing.”

“It’s not a typical system,” says Deegan. “It is first principal simulations versus a complex system embedded in an on-going experiment, with on-going calibration of systems. Keeping up with the FLOPS is almost secondary.”

Given the titanic size of the project, the SKA is being developed in phases:

2017. Already, the organization has sorted through scientific and construction proposals, evaluated technical prototypes, established budgets, implemented organizational structures, established the “science book” that outlines the science the telescopes will undertake, began the design process, and secured funding through 2019.

Already, the organization has sorted through scientific and construction proposals, evaluated technical prototypes, established budgets, implemented organizational structures, established the “science book” that outlines the science the telescopes will undertake, began the design process, and secured funding through 2019. 2018. The consortia (the engineering teams spread across the globe working on the various elements of the system) will have its critical design reviews.

The consortia (the engineering teams spread across the globe working on the various elements of the system) will have its critical design reviews. 2019. Organizationally, the SKA will transition from a U.K. corporation to an intergovernmental organization.

Organizationally, the SKA will transition from a U.K. corporation to an intergovernmental organization. 2025. Phase 1 construction aims to provide an operational array of telescopes capable of carrying out the first science in low and mid frequencies.

Phase 1 construction aims to provide an operational array of telescopes capable of carrying out the first science in low and mid frequencies. 2035. At the conclusion of Phase 2 construction, the high-frequency dishes will go live, providing full sensitivity for frequencies up to 20 GHz. This phase will also see the African array expand from 200 dishes to as many as 2,000 and the Australian array expand from 130,000 antennae to as many as a million.

Toward that end, the South Africa team is using a smaller (though still quite large) project to, among other things, test ways to successfully create that system.

MeerKAT, a 64-dish array, is being built in South Africa. It will be managed by the South African government’s Department of Science and Technology, and integrated into the SKA later in the process. “This will enable us to optimally utilize MeerKAT over the next six years for transformational science while the SKA is being built,” says Nithaya Chetty, professor of physics at the University of Pretoria and a former member of the SKA steering committee.

The computing infrastructure is already being put in place to process the data using current technologies. “We will learn a lot from this exercise to enable us to plan for the full SKA Phase 1 array,” says Chetty, referring to the first of two stages of construction and implementation. At the MeerKAT site in the Karoo, the SKA team set up an underground bunker that houses computing resources designed to enable an initial reduction of data before it is shipped to the SKA operational site in Cape Town.

The computing paradigm is different from conventional computing, so in preparation for those needs, a consortium of several universities set up the Inter-University Institute for Data Intensive Astronomy to ensure that the South African scientists are ready to exploit the data for their scientific purposes using their experience with MeerKAT to ramp up to the SKA.

What's the future of HPC? What are the challenges on the path to exascale? Learn more

Before you have a data problem

The array will collect a vast amount of data, points out Nic Dubé, Hewlett Packard Enterprise chief strategist for HPC and exascale system architect. By his calculations, the SKA project will gather about 100 gigabytes of compressed data per second, filling a 6 terabyte drive every minute, for a daily total measured in petabytes of data. “Before you have a computing problem,” posits Dubé, “you have a data aggregation, filtering, and simplification problem.”

It would currently be more efficient to load a cargo plane with hard drives and fly it across the ocean than attempt to transmit the data over the internet, says Dubé. If you fly a thousand 10TB drives from Johannesburg to Washington, D.C., it will take 17 hours, which is a rate of 163 GB per second. Although the latency is high, the bandwidth per dollar is hard to beat.

Although the airplane example is merely illustrative, Ian Bird says the point-to-point aspect of data transfer misses the point of the project. Bird is the Worldwide Large Hadron Collider Computing Grid (WLCG) project leader at CERN and a member of the advisory panel planning the regional computing centers for SKA under a new big data cooperation agreement. “The SKA needs to make the data available to a global astronomy community,” Bird says.

“We need a scheme where there can be remote access given to multiple users of the same data set,” adds Chetty. “Given the sheer volume of the data, new search routines based on artificial intelligence and neural networks need to be formulated to mine the data, to fit the data to theoretical models, to look for new signals [and] new discoveries.”

Scientists, engineers, and researchers are itching to mix it up. “Academic networking organizations want this type of challenge, as they did with the Large Hadron Collider,” says Bird. “It leads the way for other academic communities.”

One avenue toward data transmission on a large scale is the European Open Science Cloud. It provides the model for a proposed Open Federated Science Cloud, which Bird describes as “a giant cloud-based, Dropbox-like facility” available to science users around the world. “They will be able to not only access incredibly large files, but also do extremely intensive processing on those files to extract the science,” he says.

The benefits for the global computing community

Currently, the project is in a critical design review stage. But when the elements all mesh and the project is in full flower, how will computing at large benefit?

What's the future of HPC? What are the challenges on the path to exascale?

One likely outcome from the SKA—in conjunction with CERN’s Large Hadron Collider—is learning how to successfully manage scientific data at the exabyte scale. Although the SKA and LHC are pioneering this, disciplines like genomics will rapidly catch up. “Understanding how to manage and process data sets at that scale in globally distributed facilities will benefit many sciences with very large data sets,” says Bird.

These sorts of computer science advances are part of the larger on-going trend toward technology improvement in complex systems. “Moore’s Law is coming to an end, and we have to go beyond Von Neumann architecture,” says Deegan. “We have to understand how we can build not just capital constraint, but power and resilience into the system.” To keep large and complex systems up and running, the industry needs to employ tools like artificial intelligence-based predictive maintenance.

Dubé points out that the two primary astronomical locations—the South African Karoo and the Australian Outback—will need what he calls “the most balanced and energy efficient computer in the world.” This computer will have intensive requirements in computing, power, and cooling. Learning how to do that, and in a desert no less, is instrumental in a big data world.

A successful SKA project will test a host of these future tools, putting them through their paces. The project has the potential to inform the use of these tools across computing.

The Square Kilometre Array: Lessons for leaders