Introduction

The Rust programming language isn’t exactly new by any means, but it’s finally been around long enough that I figured I should stop being lazy and actually try to do something concrete with it. These days, I do a lot of work with Python, but Python programs are not particularly fast, and it’s often very difficult to identify problems in them via static analysis, which means unless you have a very through testing suite, your users find your silly typos for you.

Perhaps the most exciting feature about Rust is the distinct lack of a general null values (or nil or None depending on your language). There are a lot of excuses for using use null in other languages—It’s the way it’s always been done, having a value wrapped in an option type looks “ugly”, “It’s only a problem if you’re a bad programmer”, etc. A lot of languages have an option type now, but they aren’t heavily used by the standard libraries, so you’re still prone to surprise nulls coming from code you didn’t write.

Additionally, I wanted a language without a garbage collector. Garbage collectors make it a lot more complicated (and inefficient) to share code between language platforms, since you have to start up two large run times, each with their own GC that will run and halt your program every now-and-then. To this end, Rust’s borrow checker was promising. I’ve written a bit of C++ code, and you end up using std::shared_ptr (though at the time we used boost) a lot, but this doesn’t work if you’re using another library, and all they send you is a Foo*. In a complicated code base, you’ll end up sooner or later with a circular reference: A references B, and B references A. Once you find the problem, you can solve it with std::weak_ptr (A references B, and B weakly references A). However, most of the time this is done, there is an implicit that B should not outlive A. If you mess that up, you get null, and a segfault. References in Rust give you a way to explicitly design this so that the compiler ensures that B will not outlive A.

I’m going to claim that Rust is a similar language to Python. It’s very much not, but they do feel somewhat spiritually connected, perhaps something like Timon and Pumbaa from The Lion King . It’s not terribly far-fetched to imagine that much of PEP 20 was written about Rust. Rust is very explicit. It’s terse, but not so much so that readability suffers. It strongly supports the idea of errors never being silent, and avoids ambiguity. It has a fairly similar mechanism for importing modules and aliasing names. As it turns out, I’m not the first person to notice this either [AR2017].

A while ago, I wrote a Python library called August [AP2018] to convert HTML into text, and recently I ported it to Rust [AR2018]. It was essentially an HTML renderer, though much closer in feature parity to Lynx than Firefox. It was useful for converting HTML emails into plain text. It also made a good candidate for code to rewrite in Rust for several reasons:

It had few dependencies (just beautiful soup and standard library things)

It had a simple public interface (just one function that took a string and a number and returned a string)

It had a reasonable set of integration tests, so you could say with some degree of confidence that once the tests pass, the rewrite was complete.

It was a pretty good example of non-trivial Python code in the wild. It had a small class hierarchy and lot of python-isms like dictionaries of class & function objects, generators, and other iterators.

Additionally, I didn’t go to any particular effort to optimize the code. I hadn’t tried that hard to optimize the original code, and I wanted to get a clear sense of what a rewrite of this form might look like in the real world.