By Haley Stephenson

A smartphone app set the tempo for a fix to bring the International Space Station (ISS) back online after a thermal system failed.

For a few moments one Saturday evening in December, the Mission Control Center at Johnson Space Center went very quiet except for the sound of a metronome beat interrupting the silence every half-second. All eyes were on flight controller Mark Smith, poised to send a series of commands to the ISS orbiting Earth some 250 miles above at 17,500 miles per hour.

On the sound of one beat, Smith sent a single command to station. He counted twelve more beats and on the thirteenth, he issued a second command. Mission controllers analyzed the resulting data and realized their “metronome hack” might work. It wasn’t quite perfect, but it just might work.

*******

Three days earlier, at 8:23 a.m. Central Standard Time on December 11, 2013, for the second time in ISS history, Loop A of the External Thermal Control System (ETCS) stopped working. While this raised concerns since the loop contributes to thermal regulation for the ISS, the independent backup system, Loop B, was still operational. With the ISS one failure away from the possibility of evacuation, everyone involved wanted to bring Loop A back online quickly.

We typically think of space as being a very cold place—and it is. At the same time, it is difficult to dissipate heat in space. On Earth, we cool our homes with air conditioners and computers with fans through the processes of conduction and convection, both of which rely on the presence of an atmosphere to occur. Space is a vacuum, which makes that impossible. In the case of ISS, its power system can generate approximately 84 to 120 kilowatts (enough to power more than 40 houses). Combined with the heat produced by six crewmembers, the ISS can get pretty warm. Loops A and B of the ETCS work to actively cool the ISS, but with one loop down, the remaining loop became a single point of failure.

Flight controllers on the ground fiddled with the system to understand what had happened. They determined that a flow control valve inside Loop A’s 780-lb, refrigerator-sized pump module was broken. As a result, Expedition 38 Lead Flight Director Judd Frieling got all hands on deck to figure out how to restore nominal operations. He tasked one team with planning a spacewalk to replace the pump, a second team with the planning for a commercial cargo mission headed for station the following week, a third team with planning for the next-worse failure, and a fourth team comprised of thermal mission controllers with bringing Loop A back online without a spacewalk.

Mark Smith was a member of that fourth team—and they were on to something.

Too Hot. Too Cold. Just Right.

Moving heat from inside the ISS to the outside requires all three forms of heat transfer: conduction, convection, and thermal radiation. Heat generated inside the station warms the internal atmosphere and is extracted by the Internal Thermal Control System (ITCS), which consists of a series of pipes containing water.

These pipes snake through the inside of the ISS, absorbing heat until they reach one of the station’s ten heat exchangers. The inside of each heat exchanger looks like a delicate 45-layer sandwich. Twenty-three layers carry the warm water coming from the ITCS and alternate with the remaining twenty-two layers, which contain another fluid, cool ammonia, coming from the ETCS. Since ammonia has a lower freezing point (-107 °F, -77 °C) than water, it is more tolerant of the external environment of the ETCS and has a lower risk of freezing. Heat from the water transfers to the ammonia through the thin metal layers, cooling the water, which then leaves the heat exchanger and cycles back through the ITCS.

Concurrently, the warmed ammonia leaves the heat exchanger and travels along a line that splits in two directions. One line continues toward one of the loop’s three radiators, measuring 10.24 feet (3.12 meters) by 44.62 feet (13.6 meters). The radiator extracts the heat from the ammonia, ejects the heat into space, and sends cold ammonia back toward the station. The other line bypasses the radiator, keeping some of the ammonia warm. The two lines meet in a pump module where a flow control valve mixes just the right amount of cold ammonia with warm ammonia before the fluid returns to the heat exchanger to start the process all over again.

If the flow control valve malfunctions, the temperature of the ammonia returning to the heat exchangers can run too cold. “The thing we worry about more than anything else in our system is sending ammonia that is too cold to the heat exchangers,” said Anthony Vareha, a thermal systems flight controller on the team. If it’s too cold, he explained, water in the heat exchanger might freeze when it interacts with the “too cold” ammonia. When water freezes, it expands and can rupture the delicate lines in the system.

The fail-safe was to shut the system down to prevent cold ammonia from entering the heat exchanger. This is exactly what happened the morning of December 11. “The system was just getting colder, and colder, and colder, and that’s when we knew there was something very strange going on,” said Vareha. The team had to find another way to heat the loop back up. If they couldn’t add heat to the system due to the broken flow control valve, they wondered if there was a way to reduce the amount of heat escaping through the radiators.

By Thursday night, mission controllers were exploring the possibility of leveraging two shutoff valves in the system upstream of the flow control valve to regulate the temperature. With one shutoff valve in each line, they act like tiny doors regulating the flow of cold and warm ammonia as it cycles through the pump module. Typically, these valves remain open. When mission controllers closed the shutoff valve in the line that sends warm ammonia to the radiators, heat funneled back into the system. A little too much heat.

The thermal team needed a way to operate this shutoff valve in an intermediate position, which it wasn’t designed to do.

Attempt #1: The Manual Approach

It takes about 13 seconds for the shutoff valve to move from the open position to the closed position. The thermal team had access to the box that powered it. Their plan: command the valve to close and then kill the power halfway through.

One way of doing this is to send time-tagged commands from mission control to the station, telling the valve to power on and then power off at a precise moment. But the timing wasn’t tight enough. Six seconds was too little (too cold). Seven seconds was too much (too warm). The ISS onboard command time-tagging system could only send commands in whole seconds and they needed to be able to command at the half-second level. So they tried another approach.

When the ISS onboard software receives ground commands, it processes them regardless of the time they are sent. “If you send two commands from the ground that are 6.5 seconds apart, then they will execute 6.5 seconds apart,” explained Vareha. “You just have to do the timing yourself.”

The mechanics of this manual task were initially fuzzy until Emily Nelson, an ISS flight director who led the thermal team, had an idea. With the holidays approaching, Nelson had been participating in a number of church handbell choir practices, which involved the frequent use of her conductor’s smartphone metronome app to keep time. She quickly downloaded the app onto her phone and set it to beat every half-second. “OK, guys, here,” Nelson said, placing her phone on the console. “Now, at beat zero and beat thirteen you’re going to send the commands.”

Which brings us back to Mark Smith at the thermal systems console in mission control. While his timing was quite good, it wasn’t good enough. The loop was still running too cold. Instead of 6.5 seconds between commands, the team needed timing on the order of 6.25 seconds. “When we set the metronome to beat even faster, things just got ugly,” said Vareha. “Counting out thirteen half-seconds is a lot easier than counting out 25 quarter-seconds. We went back to the drawing board.”

Attempt #2: “The Quasnytron”

Command and data handling officer Todd Quasny got wind of the problem the thermal team was trying to solve. After the manual attempt of the so-called “metronome hack,” he wondered if he could use his knowledge of the ground-based command server to automate the solution.

“I spent the next twelve hours of my life digging through reference material and all of our technical specs,” said Quasny. He created a script that sent the shutoff valve “on” and “off” commands with a time delta specific to the millisecond. Dubbed the “Quasnytron” by the thermal team, the program did exactly what they wanted—almost.

The trouble with sending such finely tuned commands from the ground to the station was a phenomenon known as temporal latency. “There was a lot of slop in the timing,” explained Vareha. For the program to work, it didn’t matter when the commands were sent to the ISS, but it did matter how far apart the commands were executed. The ISS moves at 17,500 mph and commands are routed through different paths (i.e., satellites) to reach it. While the Quasnytron could send the commands with the right delta, they weren’t received that way—one command might get there a little too early and the other might get there a little too late. The “slop” accounted for the valve timing being off by .2 to .3 seconds. “It was like having a scalpel duct taped to a 30-foot PVC pipe,” explained Vareha. “The scalpel is precise, but it was tied to the command server, which is not as precise.”

The team realized that if commands from the ground weren’t going to work, then they’d just have to send them directly from the station’s onboard computers.

Attempt #3: The Software Patch

On Sunday morning at 8:30 a.m., Steve Joiner, an ISS software engineer, got a phone call. “Is there any way to do a software patch?” asked the voice on the other end. Technically, yes, but his team needed to look into it more.

Updating software on the ISS is usually not a rushed affair. It involves careful planning, testing, more testing, some more testing, and then reviews for weeks and months in advance. As most software engineers know, executing a program with a missing comma or parentheses in a line of code can result in a bad day.

But this was an unusual circumstance. Short of a spacewalk and changes to the flight hardware, a software patch was their only shot at operating the shutoff valve the way they needed to. They got straight to work, following all of the standard procedures at a feverish pace, and delivered the patch within 48 hours.

Joiner and his team gave mission controllers the ability to command the shutoff valve down an accurate and consistent level of 100 milliseconds. They could also move the valve in both directions. If they saw the temperatures dropping, they could close the valve by powering it on for 100 milliseconds, and heat the system back up. If it was running too warm, they could power the valve for 100 milliseconds in the opposite direction to cool the system down.

“Needless to say, it was a very busy couple of days, but by Tuesday at 6:00 p.m. we had the patch working on orbit, giving the thermal team the fine-tuned control they needed,” said Joiner. “We worked pretty much around the clock to produce, test, and deliver this software patch. It was very rewarding to see it work on orbit and provide the thermal team with the control they needed.”

The onboard software patch worked for several orbits that evening, just prior to the ISS program’s announcement that they would pursue a series of spacewalks to replace the pump module before the Christmas holiday.

The Aftermath and Why This Matters

While the metronome hack’s use was short-lived, it lives on in the lessons it taught everyone involved. Although not necessarily a “day in the life,” it is situations like these that mission controllers train for: What do I have available to me to fix this problem that doesn’t seem obvious?

Having grown up in the ISS program during its assembly, Nelson recalled how things changed constantly. “Each mission would bring an entirely new piece of the space station, and in some cases it would so radically change the system that you basically had a brand new vehicle when the shuttle undocked,” explained Nelson. “Inevitably, on almost every one of those missions whatever new piece of hardware or truss or module you were installing and activating behaved in some way that was unpredictable, so you’d have to go and figure out what to do about that.”

Nelson described the recent failure as “far more interesting” than the previous pump failure in 2010 because this time the pump was still healthy. Only the valve was broken. “We’re still considering this a valid spare,” explained Nelson. “We now know that we can control the loop with a degraded pump module. Just because it failed doesn’t mean we can’t use it any more.”

“It was really cool to be a part of trying to basically hot-wire the External Thermal Control System on the ISS to work in ways that it wasn’t supposed to,” said Vareha.

Joiner echoed this sentiment, emphasizing the importance of designing key systems to be modified to work in unexpected ways. “As we prepare for deep-space exploration missions, we won’t have the luxury of sending hardware replacements for faulty valves. However, something that still does exist is the ability to change the software through radio waves. We know hardware will fail. It is important to design redundancy into all of our key hardware and be able to adjust in the event of a failure using software. The space station provides an excellent opportunity in many ways for us to prepare for these situations on deep-space missions,” Joiner said.

Most of the heroics for big contingency events like this happen on the spacewalk side of mission control, explained Vareha. This was a situation where a lot of the troubleshooting was intensive on the systems side. “You know, those of us who sit in mission control and press buttons,” joked Vareha. “We really pulled together and came up with a way to operate the system that was never as the founders intended.”

It puts the “human” in “human spaceflight,” he added. “People designed the space station, people develop methods to run it safely and the ways to fix it when it breaks,” Vareha said. “It’s a real testament to the fact that even with all of our technology and our engineering documentation, it all comes down to people coming up with ideas and working on a team together. It’s the coolest job in the world.”

Learn about the ISS systems and the consoles in mission control.

Sign up for Spot the Station to get updates about when the ISS will fly overhead.