HIGH-PERFORMANCE COMPUTING

Sandia to boot behemoth botnet

Researchers use supercomputer to replicate how botnets operate

Starting in October, a huge botnet will be run not by nefarious underground figures but by the Energy Department's Sandia National Laboratories. The lab's Thunderbird supercomputer will periodically run a million virtual machines all at once, all with botnet client software. By setting this large network of systems into operation, the researchers, Ron Minnich and Don Rudish, hope to better understand how botnets operate.

"If you want to take a look at what is really threatening the Internet, we have to talk about the scale of the network we are working with," Rudish said. "One million gets us pretty close to understanding these botnets."

Typically used by spammers, botnets are comprised of thousands or even millions of Internet-connected PCs. The owners of such machines are typically unaware that their machines have been infected with secret programs that do the bidding of the botnet operator. Botnet operators tend to deploy their creations for spamming, distributed denial-of-service attacks, and other nefarious activities.

Botnets are difficult to study in the wild because the computers are geographically dispersed. By approximating the size of a good-sized botnet, the researchers can understand how botnets operate and the effects they have.

To prepare for the work, the scientists have already simultaneously booted 1 million Linux kernels, all of which ran as virtual machines on Thunderbird. A million virtual machines is the largest number that have ever been installed on a single system, to the best of Minnich's and Rudish's knowledge.

Thunderbird is a 4,480-node Dell-based computer cluster. Each node runs 250 Linux kernels. The host operating system on each node is a stripped-down version of the Linux kernel, which the researchers compiled. It contains only the kernel core and a start-up script that boots the virtual machines. "The root file system lives out in Random Access Memory," Rudish said.

For the virtualization, the system uses a hypervisor named Lguest built into the Linux kernel. Lguest was developed by the research arm of IBM. Although it is still in the development stage, Sandia chose Lguest because it is "very fast and very lightweight," Minnich said. On Thunderbird, the start-up for each virtual machine is a fraction of a second, Rudish said. "The bottleneck is reading the configuration file, which is a million lines long."

The management software is OneSis, which Sandia originally developed. "OneSis is pretty key to making this thing work at all," Minnich said. "It is good at managing thousands of thousands of nodes in a very easy way." All the virtual machines are networked, through virtual Linux-based routers and Sandia's own backbone routers.

Beyond the study of botnets, the researchers maintain that their work will help in understanding how to manage large systems in general.

"Anything that scales to a million, it is impossible to watch any single thing," Minnich said. "So you need to have this be a highly automated self-maintaining system." By 2018, new supercomputers will have 100 million CPUs or more. "The lessons we're learning for this project we're pretty sure will feed into the supercomputers we're building in 2018," he said.