There’s a few things to note here:

The code bears clear signs that it’s ported from C++. There’s a limited use of iterators (other than over ranges) and no use of enums or pattern matching. Look at the lines 144 and 145. This is rather inefficient since Rust will bounds check every time we index into the array, and we do that a lot during execution. C#’s JIT compiler might optimize this away realizing it’s a hot path and that the indexes are never out of bounds.

We could go “unsafe” and use get_unchecked and save a few seconds (I’ve tried it), and it would still be safe in this example since we know that we won’t access out of bounds but I wanted to keep the code safe. There is a much smarter way to do this with a small change as we’ll see below.

Optimizing part 1. Use iterators.

Let’s address the lowest hanging fruit first: the slow indexing into the global array of characters.

If you look at the original code you’ll see that we always use blocks of 4 numbers that represents our letters in the final image. Here it’s encoded as characters first, then collected as their u8 representations into a Vec<i32> (we collect directly to i32 to avoid casting them later on). By grouping this in tuples of i32 in our lazy_static function we allow our code to use one of Rust’s native constructs: Iterators.

This small change almost halved the execution time giving us the following result:

Time and memory:

time: 26.46s, mem: 5.12Mb

Good start for a simple change.

Optimizing part 2. Buffer the results if you can.

We can do better, with the tradeoff of using a little bit more memory. If you look at the outer loop, you’ll notice that if we visualize this we go from the lower right corner and up to the upper left corner.

What if we calculate the pixel positions first and then iterate through them secondly?

Our code in main will change like this:

This change had a small but noticable impact on our timings. The results are below, but more importantly this sets up all we need to use a secret weapon in our arsenal, which I’ll explain next.

The results from this change gives us:

time: 23.52s, mem: 11.96Mb

Our time went down a little bit, but our memory usage went up to by a factor of 2.5 though.

Optimizing part 3.

The secret weapon you ask? Have you guessed it? It’s called Rayon:

Rayon is an amazing library to have in your Rust toolbelt. Why? Well, you know how people say parallelizing code can be difficult? That’s not true if you can use Rayon.

Rayon is a paralleization library for Rust, and gives us superpowers if you already have a pice of code that uses an iterator to produce a result. This was was our secret master plan when we changed our outer loop to allow us to divide up the work that needs to be done.

You might think this is a big change but actually, all we need to do is this small change in our main function:

Have you spotted it? I left a hint in the code for you. The changes we made were that we pulled rayon::prelude into scope and replaced iter() with par_iter() and just like that our code now uses all the cores on our computer to calculate the result. So how did this impact our results?

The result from this last change gives us:

time: 5.50s, mem: 111.50Mb

What happened there?

Well, we’re firing in all cylinders here. All our CPUs are used to calculate the result, but there is a tradeoff in increased memory usage, a rather huge one too.

I stopped here as I promised to keep this pretty simple, but if you have any obvious suggestions to show more simple changes that makes our code more Rusty and performant, please submit a PR to the repo, and I’ll try to update the article if I have time.

In this repository you’ll find the optimized version of the code — our last step in this post.

Conclusion.

We haven’t really touched the algorithm or done any major changes. Our line count stayed pretty much the same since we started on 329 LOC and ended up on 335.

Porting C# or C++ code directly to Rust without thinking about how you would have done that if you programmed it in Rust from the start will most likely not expose you to a lot of what makes Rust what it is. The functional aspects of Rust might seem foreign at first but used wisely it will enable some optimizations that will be much harder to do without (like our use of Rayon in this example).

Now, the code above is not an example of good Rust code by any means. It’s still a mix between the original C++ code sprinkled with some Rust constructs, but the goal here was to find a fun way to point to some interesting things to look out for and the impact that might have when coming in to Rust from a language like C#, Java or C++.

We’re finished for now, but I hope you had an interesting read and if you’re new to Rust and port over code from other languages you’ve got some pointers on where to take advantage of Rust to speed up your own code even more.

Bonus: Fixing the huge increase in memory usage.

This will expose you to iterators even more, but I’ll try to explain in the comments what’s happening so bear with me. We’re still in the same part of our main function as we left off:

Thanks to jabagawee for this PR

This is the first time we actually change the method we use to calculate our pixel positions, and there are other ways to accomplish nearly the same without these changes. If you’re interested you can have a look at this gist. However this kept the readability better and the changes were pretty minor. Our change has a significant effect on our stats though:

Our last and final results are now this:

time: 5.14, mem: 7.27Mb

So our memory usage is back to normal with the added bonus of even a slightly faster execution.

Updates:

2019–03–07: Changed header picture and merged a PR that removed the backwards “D”’s inside P and R in our rendering result. No effect on any measurements but the code now gives the exact same result as the C# and C++ versions.

2019–03–08: Added bonus section thanks to a PR that showed how our memory usage could get back to normal with a minor change.