Big Problems With Big Iron



Problems to be solved and problems with the current approach to solving them



John Melchi is the head of the administration directorate of the National Center for Supercomputing Applications (NCSA), and is right in the middle of a current controversy.

Today I want to point out a role of theory in supercomputing, and how we could potentially help in new ways.



It is all about the Blue Waters Project. This was the name of a petascale supercomputer to be deployed at the NCSA, which is an arm of the University of Illinois at Urbana-Champaign. Back on August 8, 2007 the National Science Board approved a resolution that authorized the National Science Foundation to fund

“the acquisition and deployment of the world’s most powerful leadership-class supercomputer.”

The NSF awarded $208 million over the four and a half years of the Blue Waters project.

Just a few days ago IBM pulled out of the project, returning a large sum of money. According to the Associated Press:

CHAMPAIGN, Ill.—IBM Corp. is leaving a project at the University of Illinois to build the world’s fastest supercomputer. The university’s National Center for Supercomputing Applications and IBM said in a brief statement Monday that IBM terminated its contract. IBM said the computer had grown more expensive and technically demanding than anticipated. The NCSA said it still plans to pursue a petascale computer in what it called “timely manner” and is working with the National Science Foundation. Petascale refers to the speed of the computer and means it could perform a thousand trillion mathematical operations a second. IBM was chosen in 2007 to build Blue Waters. The computer was initially due to go online this year.

The Controversy

As you might imagine, many in the supercomputer area—or as they say the high performance computing (HPC) area—are not happy with the current situation. Maybe no one is happy, including Melchi at NCSA. In 2007 there was a competition and NSF chose among several very strong proposals, one of which involved Georgia Tech. I had nothing directly to do with the proposal, but needed to add that for full disclosure. Of course the teams that lost the competition are upset, since they believe that they could have succeeded if they had been given the chance—and the money.

The fact that the Blue Water project has collapsed with the removal of IBM is a serious issue. My friends who work in HPC are quite interested in what happened, what went wrong, and on and on. They have many questions, and no answers.

HPC

I hope there is a serious and open discussion about what did happen with Blue Waters, but that is not what I think we need to talk about. Rather I think there is an even more foundational issue involved. The rationale behind projects like Blue Waters is:

Science and engineering face problems that cannot be solved on todays computers systems, because of their immense requirements for time or space or both. The only way to solve them in the future is to build faster and faster supercomputers—bigger and bigger iron. Thus, NSF and other agencies must spent 100’s of millions of dollars, perhaps soon billions of dollars, to create such large systems.

I agree with the first statement completely: more generally there are societal computational problems that are beyond our abilities. These include problems of all kinds, from pure science, to very applied problems. We all would greatly benefit if they were solvable.

What I disagree with is the assumption that the above is the only approach to solving these problems, which are often called grand challenges. This is more especially true today than years ago, because of the inability to continue to make faster and faster uni-processors. There was a time when each generation of computers was faster than the previous generation—this was mostly a consequence of advances in device physics, essentially advances that followed Moore’s Law.

Today because of power limits and other technical issues that I am not an expert on, processors and therefore supercomputers have had to become extremely parallel machines. Extremely. This many-core approach is exciting, but does not give the automatic improvement that one used to get with just increased clock rates. Last year Noam Nisan wrote on related issues where he quoted from the Report to the President and Congress: Designing a Digital Future: Federally Funded R&D in Networking and IT. The central point was that there are two ways to speed up and solve a problem: faster iron and better algorithms. Speedup is always of the product of two terms:

In the cited report Martin Grötschel notes:

that a benchmark production planning model solved using linear programming would have taken 82 years to solve in 1988, using the computers and the linear programming algorithms of the day. Fifteen years later—in 2003—this same model could be solved in roughly 1 minute, an improvement by a factor of roughly 43 million. Of this, a factor of roughly 1,000 was due to increased processor speed, whereas a factor of roughly 43,000 was due to improvements in algorithms!

Note, the 1,000 fold increase in processor speed was mostly due to increased clock rates, not the use of massive parallelism.

The Approaches

Granted that society has problems—computational problems—that require solution, there are several approaches:

Bigger Iron This is the perceived approach that is taken by HPC researchers. Projects like Bule Waters attempt to build larger and larger systems that can execute more and more instructions per second. Petascale was the Blue Water goal, exascale is next, then zettacsale, and then yottascale. Of course the bigger iron is not really just faster; the new iron is more and more parallel, and therefore, more and more difficult to program.

Bigger Iron and Better Algorithms This is the actual approach that seems to happen. Both the iron gets faster and the algorithms get better. The claim by Grötschel is that more of the improvement of the speed of solution has often come from the algorithmic breakthroughs, but somehow this fact is not as exciting as thinking about big iron. Big projects that build large iron today commission whole buildings, with huge needed sources of power and cooling, and are much more visible than any algorithm. Oh well.

Break The Rules This is the approach that I would like to elaborate on in the next section. It is one that we are equipped to do, but not one that NSF funds to even a fraction of the level that large iron gets. Perhaps with the Blue Water situation we should seriously consider spending huge sums on such projects. They will be high risk with huge payoff, but perhaps the time has come where they risk reward has shifted to make this a viable addition to the HPC investment portfolio.

Break The Rules

My suggestion is more radical, and is based on the fact that I am an “ optimist.” Recall that the definition of an optimist is:

An optimist is a guy that has never had much experience.



Don Marquis 1927

Since I do not work in HPC I can be an optimist and suggest that theory can play a huge role, indeed an even bigger role than the development of improved algorithms. What I think is possible is that we can approach the problems that need to be solved with new and completely fresh ideas. That is not to attempt to improve the performance of the existing algorithms, but to go straight to the original core problems and approach them in new ways.

I do not think this has ever been done before on a large scale, and would have high risk. But building larger iron, the current focus of NSF, is clearly not without risk.

I have previously discussed this at length here. Take a look for more details on what I mean. One concrete example is the recent work on RNA structure via game playing—see this for more details.

Open Problems

Can theory play a bigger role in solving society’s problems, not merely by speeding up existing algorithms, but fundamentally by re-working approaches to the original core problems?