Liquid cooling may be your next major upgrade

Enterprise-scale data center operators are not known for jumping on bandwagons, but they are slowly adopting a technological idea that has taken off among overclockers, PC hobbyists, and gamers: liquid cooling.

It’s not a new idea for servers; Cray supercomputers used a special liquid refrigerant for cooling back in the 1980s. Without it, the Cray-2 would have melted into a pile of slag. But by and large, rack-mounted servers have relied on the traditional heat sink and fan approach. The heat sink, made of copper and aluminum, draws heat away from the CPU, and a fan blows on the heat sink to cool the metal.

The problem is that will get you only so far. Even though CPU technology continues to advance by die shrinks and lower power draw, the power envelope is actually increasing as more cores are packed into a CPU and more CPUs are packed into a smaller space.

Water cooling operates like air conditioning. You need water, something to chill it, and something to transport it. That means a lot of piping and a lot of pumps to move cold water into the server and hot water out, not to mention redundancy in the event of failure.

The bulk of water cooling is a heat sink on the CPU with two pipes, one to bring in cold water to cool the heat sink and one to take the heated water away, to be rechilled and recirculated. At this point, only CPUs are cooled. Other components don’t need it, except for GPUs, which are gaining ground in high-performance computing (HPC). However, the GPU card design and installation make it difficult to fit a cooling system. Because CPUs are socketed on the motherboard, it’s easier to fit them with the heat sink.

So why use water instead of air? For starters, the air cooling method dates back to the era of the mainframe when things didn’t run that hot. Computer rooms had a raised floor with tiny holes in it for cold air to come up and be sucked into the mainframe, or in this case, the server cabinet. This was known as a CRAC unit, or computer room air conditioner.

The problem is that CRAC units are not suited for high-density rack cooling because they simply cannot provide enough cooling airflow through high-density racks. The laws of physics demand more cooling capacity than cold air can offer, short of putting the server in a freezer. And server density is increasing for certain applications.

What's the future of HPC? What are the challenges on the path to exascale? Learn more

HPC demands liquid cooling

“The trend in IT is they want to increase server density, all the way down to the chip level. That means increasing the power of the chips, putting more chips per rack unit, and filling up the racks as much as possible. So rack power has transitioned from a normal of 3.5 to 4 kilowatts; now people are routinely planning to support 60 to 70 kilowatts per rack,” says Geoff Lyon, CEO of CooIIT, a company that specializes in liquid cooling for data centers.

The ability to move that much heat with forced air is difficult, if not impossible, Lyon says. “Fan power and compressor power get to be almost unmanageable. The logical method historically is to transition to liquid cooling, utilizing the naturally high heat absorption of liquids.”

Todd Boucher, principal of Leading Edge Design Group, a consulting firm that designs, installs, and maintains data centers, echoes this sentiment, noting this is primarily a high-density, HPC effort. “In a typical enterprise data center, we’re not seeing that type of density—that is, in high-performance and research computing and Bitcoin mining and hyperscale data centers," Boucher says. "The density in data centers is rising, but I would not say it is prevalent or common to see a data center with a rack density of 50 to 70 kilowatts per rack.”

However, Steve Conway, senior vice president for research at Hyperion Research, IDC’s HPC group, says density is increasing in regular IT environments as well.

“We did a study a while ago for the Department of Energy on HPC data centers," he says. "They asked us to expand it onto other kinds of data centers, meaning enterprise IT data centers. The results were surprisingly similar. Densities are increasing there, too, and physics is physics. You have to deal with heat the same way. So liquid cooling is on the rise in enterprise data centers."

Conway adds, “When we talk with industry about this business and ask, ‘How much do you worry about the electric bill?’ industry in particular says, ‘We’re attentive to it, but it’s not really a driver. We see it as a cost of doing business.’ Particularly in industry, they don’t see the electric bill.” If companies save money on something like power and cooling, he says, it doesn’t drop to the bottom line anyway. They just want as much computing power as possible.

“The more computing you do, the better," Conway says. "In enterprise data centers, if you’re doing normal business operations, there could be a tremendous volume, but it’s finite. Once you’ve done payroll, you’re done. With HPC, there is no end to how much computing power you can apply to it.”

Even in HPC, liquid cooling is a fringe idea. “At this stage, since there are so many commodity servers not liquid cooled, the liquid-cooled adoption is still in a single-digit percentage. Everybody is moving toward it, but it still is in the early-adopter stage,” says Lyon.

Liquid-cooling chemistry and physics

Lyon says there is no comparison between a volumetric liter of air versus a volumetric liter of water. “If you change the temperature of that air by a degree, the amount of energy it absorbs is minuscule in comparison to the same for water. Water has 3,000 times more heat capacity than air.” Running at full speed, a server can move 300 liters of water per minute versus 20 cubic feet of air, and with that greater heat absorption, it can cool much more efficiently than with air alone.

It’s much quieter, too. With liquid-cooled heat sinks on the CPUs, few or no fans are needed inside the server. The rest of the server—memory, motherboard chips, maybe a solid-state drive—can get by on ambient air flow, though cooling options do exist for almost every component.

Besides being stifling from all the heat vented out the back into the hot aisle, data centers are also ridiculously loud from all the fans. That's not the case if you are using water cooling, notes Greg Crumpton, an independent consultant who specializes in liquid cooling.

“So you dissipate that 100 decibels of noise in a lot of data centers down to nothing because there is no air required,” Crumpton says. This results in a much more comfortable working environment for data center technicians.

One thing about liquid cooling is that although the vast majority of liquid-cooled systems use just water—with the rest using some kind of propylene glycol or antifreeze, or even a few custom-manufactured liquids—you can’t exactly use tap water. It’s full of all kinds of contaminants that must be filtered out. So that means buying expensive filtration systems along with everything else, or a whole lot of distilled water.

“Once you get these deployed, you don’t want to do much to maintain them,” says Lyon. “You want it to be as stable as possible. You don’t want calcium, lime, and rust buildup. If you use municipal water, you have to adhere to strict water regulations around alkalinity, conductivity, particulates, hardness, and mineral content. It would have to be substantially processed.”

There are some pretty impressive advances in non-water coolants, however. Crumpton, who works with Ebullient Cooling, gave a demonstration to a bank executive by pouring what looked like water all over the executive's work laptop. The water was actually Novec 7000, developed by 3M and used by Ebullient in its cooling systems. Because it’s non-invasive and non-conductive, nothing shorts out if you have a leak. In Crumpton's laptop demonstration, Novec evaporated in seconds. By the time the executive got over his initial freak-out over liquid being poured onto his laptop, it was dry. He was sold.

Liquid economics

The cost of liquid vs. air is hard to measure because they are used at different compute densities, making a direct comparison difficult. Also, experts can’t agree on a price. Conway puts the price for liquid cooling at one and half to two times more expensive than air cooling. Crumpton says it’s just 10 percent more, adding that in the right environment, it can actually be cheaper.

“Liquid can be cheaper if I know how to do it,” he says. “But it’s so dependent upon the square footage. If you give me a half million square feet, I can make it look really good. But in 100 square feet, it looks really stupid. The infrastructure required for heat exchange via liquid cooling scales.”

Liquid cooling becomes more efficient as the data center gets larger due to density per rack. It’s better to dissipate 30 kWh of power from one rack than 15 kWh in two racks because you have to run half the amount of pipe.

Liquid not only operates on scale but is kind of a requirement. Crumpton says major OEMs selling liquid-cooled server enclosures won’t sell anything under 100 units because the economics are just not there. If you want to liquid cool fewer than 100 servers, you have to buy all the parts and do it yourself instead of having the OEM do the installation and hookup. “We don’t buy enough to make it cost efficient in their mind,” he says. “It will change in time. I think that liquid will be the de facto standard in five years. But it’s like the first guy with a smartphone. He’s got one app. You’re the early adopter guy, and you have to pay the price for that.”

Retrofitting existing systems isn’t a popular option because more infrastructure is required to deliver liquid cooling. You need a manifold in each rack, which is connected to the water input and return paths inside each rack. You need connections under the floor for each rack as well, so a lot of piping for the liquid has to go into the data center. Once you move from the rack, you need a heat exchanger to discharge water to a heat rejection side, like a cooling tower or chiller. And you need a lot of pumps and redundancy in those pumps.

“It all depends on what you have today,” says Boucher. “If you don’t have a chilled water system, it’s a lot more expensive. The economics of retrofitting become a lot less attractive because you can’t leverage existing infrastructure.”

Other liquid-cooling methods

There are a few other methods of liquid cooling beyond just a cold-water pipe into your CPU. Some server cabinets use chilled rear doors, where cold water is piped through and the cool air the pipes give off is blown onto the server.

And in some cases, the entire motherboard or entire computer is immersed in a non-conductive fluid, so the hardware doesn’t fry. That, however, is in extreme exceptions. It requires a complete revision of the hardware. Ports, for example, have to be at the top of the hardware because they can’t be immersed in the liquid. And the liquid has to be non-conductive, which is more expensive than water.

Then there are hot water cooling systems, which sounds like a contradiction in terms. A hot water system simply does not chill the water after it is pumped out from the CPU. Instead of using refrigerant to cool the water to 45 to 55 degrees Fahrenheit, water is typically between 90 and 105 degrees Fahrenheit, which is still a lot cooler than the CPUs run. Hot water cooling lets the water cool naturally as it circulates through evaporator towers, which use ambient temperature diffusion.

It’s been known for some time that CPUs can run sufficiently without having meat locker temperatures. As far back as 2008, Intel was posting research that showed a data center can run at temperatures that, while uncomfortable for the humans that inhabit them, are tolerable to the servers. And hot water cooling supporters argue it’s more cost effective, since no refrigeration is used.

Changing minds

Some people are also scared off at the thought of running water into their expensive servers, a natural concern. While Crumpton’s laptop demonstration was effective, some people might not appreciate such a demo and need more gentle persuading.

Lyon says the hardware makers kept that in mind when designing their systems. “Everybody is nervous about the same thing, so vendors have developed a robust process to make sure all the parts are validated, tested, and proven,” he says. “There are several different players that have deployed millions of liquid cooling systems. The integration practice and assembly practice are well understood, and they can deploy liquid cooling in a reliable manner.”

Boucher has not heard of any leaks frying servers, a testament to how well the liquid cooling components are engineered and manufactured. “When they were first introduced, there was the risk of bringing water into computing equipment," he says. "But the manufacturers did a great job in designing products that minimize risk to the end user, and liquid-cooled [systems] vendors did the same thing. That doesn’t mean there is no risk, but they did a nice job.”

“There are so many people that are my age and younger that are aquaphobic,” says Crumpton. “Until those folks retire or get used to liquid in the data center, you’re always going to have that ‘What happens if we have a leak?’ mentality. What they don’t realize is they already do have water cooling; they just don’t know about the infrastructure, whether it’s refrigeration or condensation removal.”

Liquid cooling: Lessons for leaders