Understanding the nuances of infectious diseases—in particular malaria, which killed about one million people worldwide in 2008—is a crucial step toward wiping them out. However, getting a clear picture of how malaria spreads and how it responds to eradication efforts means accessing a daunting amount of data from a variety of sources, the type of job best suited to a number-crunching supercomputer.



Supercomputers, once the privilege of a select few universities and government laboratories, have become redefined in recent years so as to make them more accessible to smaller research labs as well. This includes a team from Intellectual Ventures in Bellevue, Wash., that is taking advantage of the speed and power of a supercomputer brought online over the past year to create complex simulations they hope will reveal solutions to complex problems, including the spread of malaria.



Intellectual Ventures' supercomputer is a work in progress that is shared by two different teams of researchers within the organization—one studying malaria (pdf) and the other, called TerraPower, studying nuclear reactor technology. The malaria project got off the ground in 2007, after the Bill and Melinda Gates Foundation called upon Intellectual Ventures to develop new technologies to fight malaria. This spawned the idea of using computer models to simulate the spread of the disease worldwide.



The supercomputer consists of 138 Dell blade servers running multiple processing units (or cores) on each server, for a total of 1,104 cores. Intellectual Ventures generally devotes 1,024 of those cores to TerraPower and the rest to its malaria research. The researchers chose Microsoft Windows as its operating system (Linux is also commonly used in supercomputing clusters) because the system administrators at their facility are familiar with Microsoft software. It did not hurt that Microsoft co-founder Bill Gates is investing in both the TerraPower and malaria projects, and that Intellectual Ventures itself was formed by former Microsoft executives Nathan Myhrvold and Edward Jung.



The supercomputer, which has five terabytes of memory and 30 terabytes of storage, supplies the brute power to crunch numbers, but this would mean little without the software to instruct the computer. The software pulls biological data on the behavior and reproductive rates of the Plasmodium parasites and the mosquitoes that carry them, as well as information on infection patterns and immune responses among humans. Other data include where people live and how they travel, environmental factors (temperature, rainfall and elevation) that are important to malaria transmission, and the locations of different species of mosquitoes. The software uses this data from a variety of sources—including the World Health Organization, Malaria Atlas Project, universities and NASA—to create models of how malaria outbreaks play out.



Before the supercomputer became available last year, the malaria project researchers used an eight-core computer to establish the basics of their research. They needed to expand their computing power, however, to more accurately model the disease over larger geographic areas. "A larger cluster means you can simulate larger areas in the same amount of time," says Philip Eckhoff, a research scientist at Intellectual Ventures. The team uses the Monte Carlo approach to create its malaria simulations, relying on information from repeated trials to build results. As such, access to more cores allows the researchers to run more trials faster and reach their target number of trials sooner.



Early Wednesday afternoon, the supercomputer was running nine different research jobs. One of the jobs, which required 72 computer cores to perform, was a simulation of a potential polio program for India. The simulation included information about India's population (ages, population dispersal throughout the country, migration patterns and demographic data) and played out a scenario of how the disease might spread as people interacted with each other. "It's a probabilistic approach," Eckhoff says. "Some interactions lead to disease, some don't."



Intellectual Ventures has plans to further expand its supercomputer by adding nodes. The company's computer facilities have room to grow and can accommodate up to 3,000 cores without needing to change the facility's power and cooling systems. The researchers estimate that they could squeeze in up to 6,000 cores if investments were made to beef up power and cooling.



The demand for supercomputer power on a budget has attracted tech vendors to the high-performance computing space that have generally played in a smaller sandbox. Microsoft (through its Windows Azure Platform), Amazon (through its Amazon Web Services), and others are offering "cloud" services, whereby they use their massive data centers to host the data, software and computing resources for their customers, who access the information they seek through their desktop computers.



Microsoft earlier this week introduced an initiative that will focus specifically on offering hosted high-performance computing resources. "Our understanding is that the Microsoft Technical Computing Group is working on bringing 'technical computing,' supercomputing, to the masses," says John-Luke Peck, an Intelligent Ventures systems engineer, who points out that his company's supercomputer uses Microsoft software that can take advantage of parallel processing. "Their solution can and will bring opportunities to researchers, students, and others, that were not previously available."



Although much has been made of computing in the cloud, this is not an option for every research group, including Intellectual Ventures. The primary reason for building their own supercomputer is that some of their projects could have national security implications, which means those data cannot be exported to foreign countries (where many service providers have data centers), says Chuck Whitmer, a consulting physicist to Intellectual Ventures and the neutronics and modeling lead for TerraPower.



A secondary reason is that a distributed, cloud-based approach has more time delay in information transfer than when a system is on-site. Whereas Intellectual Ventures can, generally speaking, achieve a data transfer rate of 20 gigabits per second to get data from its computers to the supercomputer, Peck says, the researchers would probably not get even one-tenth of that speed if they used a supercomputer located offsite.