A year ago, I joined an IoT startup with a mountain of legacy Python code running on a Raspberry Pi. Over the course of several years, this code had grown organically, as most codebases do. The original purpose of this code was to monitor an MQTT command channel, decode and execute commands on a Z-Wave network via a serial port, and report back success or failure via another command channel. In addition, the code was intended to report back information about the overall health of the system and the state of devices connected on the Z-Wave network. Somewhere along the way the software metastasized to control status LEDs and control a cellular modem over a separate serial port and communicate with the onboard release management software.

A senior software engineer on my hiring interviews desperately wanted to rewrite the whole thing, and I was leaving a C++ shop that was comfortably juddering along on the momentum of its prior successes. Spoilers: he wanted to switch the existing Python code over to Rust, and the prospect was so exciting I jumped ship from my old job and moved cities.

But before we get around to talking about why we picked Rust or why that’s exciting, why did we decide to rewrite at all? The conventional wisdom says never to rewrite software when you have something that already works, even partially. But if we followed that logic, we would still be running the Apollo Guidance Computer software on future space missions (hey the architecture is 1’s complement, but the last bug report was in 1969 so it must be solid.) Why rewrite anything, ever?

In short, it boils down to business needs.

Architecturally, the existing architecture was incapable of scaling to other technologies or changing direction without massive amounts of effort. The business had just been forced to change to MQTT from another IaaS provider because of licensing cost concerns, and the cutover took nearly a year. With new devices released every year (e.g. BLE, Wifi, Z-Wave, Zigbee, arbitrary REST APIs) the business wants to be able to change IoT stacks quickly to adapt to new technology.

There were technical debt items nobody understood or was prepared to resolve. (Did I mention that none of the original programmers were still around to fix bugs or answer questions?) Fixing obvious issues in one place often broke the program in completely unrelated parts of the code.

The program had unit tests in places, but there were no coding standards — someone’s “very clever” generator expression state machine drove the serial framing protocol, but it took weeks to figure out why it was broken.

There was dead code everywhere, but we couldn’t prove it was really dead code.

Holy cow, the bugs. Did I mention the bugs?

There were opportunities to replace the error-prone first-party Python Z-Wave handler code with a vendor-supplied reference implementation written in C. It would have been more effort to hack the existing Python implementation around the C library than to just rewrite the thing.

We wanted to run more customers on cheaper hardware. Improving that ratio directly drives higher profit margins for the business.

So from those business needs we can start to pick apart some of the actual requirements in the chosen language for our particular project:

It needs to talk C and run against C libraries.

There are timing requirements (because of the serial communications).

We need to be able to run it on a potato (because of cost).

It has to be able to run both the serial communications and a bunch of command/telemetry at the same time, without bugs.

String manipulation should be easy, because the commands and responses are all JSON.

The software must work correctly and deterministically, even though we are not all genius programmers.

It needs to be secure.

As it happened, Rust fit the bill for all of these needs.