The world's servers are headed for a wall. And Russell Fish aims to break through it.

As the Intels of the world pack more and more processors into the chips running today's supercomputers and massive web operations, you'd think they would only get faster. But there comes a point where they actually get slower. According to a recent study from Sandia National Laboratories, when a chip includes more than eight processors, or "cores," it hits a memory wall. Performance drops because the cores start competing with each other for access to memory.

"We're just up against a stump," says Fish.

Russell Fish is an inventor, entrepreneur, school builder, techno-activist, and former world-record skydiver, but he's been most successful when designing chips that overturn the established order, and with his his latest venture, Venray Technology, he aims to do it again, offering an invention that can tackle the memory problem facing the modern server.

With today's servers, memory resides on separate chips from the processor, and shuttling data between chips slows things down. "It's that motion of data that kills you," says Fish. "That motion consumes energy and it takes time." Processors include a small amount of "cache memory," which reduces the number of times the processor has to fetch data from main memory, but Venray goes further. It puts the processor and main memory on the same chip. "Our processors live in the middle of the data," Fish says. "We don't have to go get it. We don't have to go off chip."

It's called processor-in-memory, or PIM, and it's not exactly a new idea. Fish and others have been pursuing the idea for decades. But its time may finally have come. In today's world, biomedical research and other "Big Data" applications that juggle enormous amounts of information are butting up against that memory wall, and if we're to achieve "personalized medicine" – where we tailor drugs and other treatments to an analysis of an individual’s genetic makeup – we need chips that can push through that wall.

Once a Chip Man, Always a Chip Man

Russell Fish knows chips. He began his career at Motorola in 1974 and spent several years at Fairchild Semiconductor, the Silicon Valley computer chip pioneer. He dropped out of the business a while – designing personal e-mail terminals and temporarily setting a world record for most parachute jumps in a 24-hour period (255) – but then he came back.

In 1988, together with noted computer programmer and architect Charles Moore, he built the Sh-Boom microprocessor. This chip was four times faster than the commercial processors of the day, thanks to an internal clock that allowed it to run faster than the circuit board it was mounted on. Today, virtually every computer chip uses a variation of this technology, and in 2009, it made IEEE Spectrum's list of the 25 Microchips That Shook the World. With Venray's PIM chip, Fish aims to shake it again.

The idea actually dates back to the same period. In 1989, Fish and Moore filed for a patent – U.S. patent 5,440,749 – that shows a processor siting inside a memory chip.

According to Fish, this was the first documented reference to PIM technology. But the PIM has many fathers, mostly notably David Patterson, a researcher at the University of California at Berkeley and one of the pioneers of reduced instruction set computing (RISC) – stripped-down, speedy microprocessors that dominated the engineering workstation market in the late '80s and early '90s and helped bring the UNIX operating system to prominence.

Patterson, Indiana University's Thomas Sterling, Notre Dame's Peter Kogge, and a few other researchers working under government grants produced a flurry of PIM papers and prototypes in the mid-'90s. These technologies worked – a 1996 Wired article captured the excitement of the time – but according to Fish, they were too expensive. The problem with these approaches, he says, is that they embedded memory in processor chips, rather than embedding the processor in the memory.

"They got the architecture right but the implementation wrong," he says.

All digital chips are made from the same basic building block: the transistor. But not all transistors are the same. There's a huge difference in the cost of a processor transistor and a memory transistor – about 500 times, if you compare billion-transistor chips. Processor transistors are optimized for speed, while memory transistors are optimized for cost and low power leakage. Patterson and his cohorts, Fish says, chose the expensive processors transistors when they should have picked the cheap memory circuits.

The rub is that designing logic circuits for memory chips is, well, challenging. Memory chips have only three levels of interconnects – the microscopic metal wires that connect the transistors – while processor chips have 10 to 12 levels. "You've got to be really, really efficient in how you allocate the transistors or else you can't hook them up," Fish says. And he believes Venray has the knack.

To the End of Silicon

Known as TOMI Borealis, Venray's prototype puts 16 chips on a 4-inch circuit board that's about the size of a memory card. The board includes 128 cores and 2GB of DRAM. Thirty-two One hundred twenty-eight of the boards then fit onto a 19-inch motherboard. Venray ran a simulation using Sandia's MapReduce benchmark test with a 256GB data set and found that a one-motherboard TOMI Borealis system outperforms a whole rack of Intel Xeon motherboards. MapReduce is software developed by Google that allows large numbers of servers to crunch large data sets, and an open source version of the technology, known as Hadoop, help drive web operations such as Yahoo!, Facebook, and eBay as well as businesses in other markets.

The aims is to build a kind of cloud computing appliance. A rack of 64 19-inch boards would have over 16TB of memory and over a million cores. "It's a huge amount of storage, but more importantly it has these million cores in there that will allow you to find the information," he says. "Just storing the data is not interesting. It's being able to use it...And that's what Big Data is about."

Big Data – particularly big data for biomedical research – is more than a business opportunity for Fish. The native Texan lost both parents to Alzheimer's. "I would love to have a cure."

The technology is still in the early stages, and though the prototype is promising, there are many battles ahead. Beyond the technical challenge, Fish has to overcome the conventional wisdom that PIM is a failed technology. But he's already gotten further than most. Thomas Sterling, one of those PIM pioneers, is "delighted, even envious" that Venray built a business based on PIM despite the persistent group-think in the semiconductor industry that it can't be done because it hasn't been done.

Venray aims to go only so far. It's an intellectual property company, not a chipmaker. Fish intends to sell TOMI to some other chipmaker. If anyone buys the technology, it's likely to make the chips at an existing DRAM foundry, and that poses a problem. The DRAM industry is suffering a bout of bankruptcies and pending consolidation. This turmoil makes near-term commercialization of Venray's technology uncertain.

But Fish is confident that his latest chip design is the cure for what ails the computer industry. "The entity that controls [TOMI] probably controls computer architecture to the end of silicon," he says.

His confidence comes off as bravado. But he says this comes with the territory. "Computer architects are the intellectual fighter pilots. Every last one of us thinks we're the smartest guy in the room, bar none. So we fight like cats and dogs, and eventually somebody wins. That's what's going on right now."