The popular press was recently abuzz with sad news from the planet Mars: Opportunity, the little rover that could, could do no more. It took an astonishing 15 years for it to give up the ghost, and it took a planet-wide dust storm that blotted out the sun and plunged the rover into apocalyptically dark and cold conditions to finally kill the machine. It lived 37 times longer than its 90-sol design life, producing mountains of data that will take another 15 years or more to fully digest.

Entire careers were unexpectedly built around Opportunity – officially but bloodlessly dubbed “Mars Exploration Rover-B”, or MER-B – as it stubbornly extended its mission and overcame obstacles both figurative and literal. But “Oppy” is far from the only long-duration success that NASA can boast about. Now that Opportunity has sent its last data, it seems only fitting to celebrate the achievement with a look at exactly how machines and missions can survive and thrive so long in the harshest possible conditions.

Fail Early, Then Stop Failing

Failure is always an option, and recognizing that fact is one of the prices of doing business in space. The early days of space exploration were punctuated with multiple catastrophes, mostly within the first few minutes of launch. That just reflects the difficulty of the endeavor; taming tons of volatile propellants and getting everything in the right place at the right time is a challenging business. Mistakes were made, and many missions were lost.

But failures, especially high-profile and expensive ones, teach valuable lessons, and NASA is really good at figuring out what went wrong when anything happens. NASA has entire labs dedicated to failure analysis, from structural and materials failure to electrical issues and software. They take failure analysis very seriously, to the point of writing their own software, the Root Cause Analysis Tool, or RCAT, to track and evaluate undesired outcomes.

Learning from their mistakes has increased the success rate of missions steadily over the years. Losses of missions due to launch issues are few and far between now compared to the early days. NASA still suffers failures once payloads are in transit or on station, of course. For example, Galileo suffered a serious failure while deploying its high-gain antenna that almost ended its mission to study the Jupiter system. Failure analysis led NASA engineers to conclude that leaving an umbrella-style antenna stowed for four and a half years and not relubricating the system prior to launch is not a good idea.

Failure analysis doesn’t just look at problems with hardware; NASA is very serious about finding issues with their processes too. When communication with the Mars Climate Orbiter was lost as the spacecraft entered orbit, it joined a long list of missions that the Red Planet rebuffed. NASA discovered that the root cause was the use of non-SI units in ground-based software used to calculate the thrust of orbital insertion burns, rather than the SI units expected. It also found that warnings from two separate navigators that the spacecraft was not in the right position were ignored because they had not been reported according to policy.

Engineered to Last

Space exploration is an expensive business, mainly because of the cost of getting useful amounts of hardware out of the deep gravity well we all call home. But the spacecraft themselves are pretty pricey, thanks in part to the engineering that goes into them. When something is intended to operate for decades while traversing millions of miles of the most hostile conditions imaginable, close enough won’t cut it.

To make sure that interstellar probes, planetary explorers, and even the ground-based system that support them do not fail, or at least maximize the time until failure, NASA has developed a massive body of very specific and very stringent workmanship standards. As Gerrit Coetzee pointed out a few years back, the workmanship standards documents are themselves works of great beauty. They cover every conceivable kind of electromechanical assembly, showing the “NASA way” of doing it correctly. How to solder correctly, when to crimp instead, how to prevent PCB damage, how to prevent electrostatic discharge damage, and even how to properly tension wire ties are all covered. For my money, though, the pièce de résistance is the section on lacing wiring harnesses. Pure engineering beauty.

Aesthetics aside, the NASA standards for workmanship and the general engineering principles it follows are a huge factor in favor of spacecraft lasting long past their “best by” date. The amazing success of Opportunity was only the latest in long-haul engineering wins for NASA, thanks in large part to principles like building in redundancies at every level of design. That saved the rover’s bacon a number of times, including in 2014 when “amnesia events” with the vehicle’s non-volatile memory led to several system resets. Controllers were able to reconfigure the rover to use only its RAM and continue the mission for another full year.

Sticking with seemingly outdated technology is another way NASA gets so much life out of its machines. We’ve covered a few examples of this before, like the use of orbital photo labs for lunar reconnaissance, or the 8-track tape decks used on Voyager and Galileo. Both were tried and true and offered reliability far beyond what could have been achieved with other means.

The computers that NASA chooses to fly into space are also decidedly behind the times compared to what is commercially available at the time the vehicle is built. Galileo, for instance, flew to Jupiter with six RCA COSMAC 1802 8-bit microprocessors, built on sapphire substrates for radiation resistance. Even New Horizons, built in 2006 and which recently visited Ultima Thule, was equipped with a radiation-hardened version of the MIPS R3000 CPU, a RISC chip that first hit the market in 1988. Old, slow, and working beats fancy, fast, and buggy any day of the week.

Moving the Goalposts

There’s another aspect to the success of long-term NASA missions, this one more of a social engineering approach than physical engineering. NASA design its missions very carefully, in terms of what science gets done, when it gets done, and how resources on a spacecraft are allocated. For Opportunity, NASA got a lot of mileage out of the oft-repeated “it was only supposed to last for 90 days” figure. I won’t quibble with that, but it’s a little unfair to NASA. The vehicle was obviously engineered to last much, much longer than 90 sols (Martian days), and if that dust storm hadn’t been as deep and as long as it was, the rover would probably still be running today. Rather, NASA planned for all the science to get completed within 90 sols, hoping that it would last longer. Every sol past the scheduled end of the science program was gravy.

This mission extension is something that NASA very much plans for – sending millions of taxpayer dollars out into space without a plan to maximize the return on investment doesn’t work. The Voyager program is a perfect example of this. Technically, the mission for Voyager 1 was over when it flew by Saturn, and Voyager 2‘s primary science was completed after its encounter with Neptune. But with the spacecraft still in good shape and with minimal budget needed to continue communicating with them, NASA began the Voyager Interstellar Mission (VIM) that continues to this day, gathering data from interstellar space.

For my part, as impressive as Opportunity‘s accomplishments were, and for as sad as I felt when that dust storm set in and we stopped hearing from the rover, the real benchmark of space engineering is the Voyager twins. Their RTG power systems should provide enough juice to keep the VIM going for another five years or so. That will be a truly sad moment for me, when the mission that I’ve followed from its launch in 1977 will finally be over. But I’ll take solace in the idea that perhaps someday, an alien civilization will find these exquisite machines and see just what kind of engineering their makers were capable of.