Introduction

If you have ever written a program of any scale before, you may have run into bugs. Tiny mistakes that you made while writing that screw up the execution of your program. The more complex your program gets, the higher the chance of bugs slipping in!

To fix and prevent bugs, programmers have a bunch of methods at their disposal. One of which is determining the correctness of the program before it is ran: static typing. This technique is a part of the design of the programming language and prevents you from making simple mistakes like attempting to use a string as an integer, or compare objects like e.g. a Car with a Book .

My personal stance is that a programming language and its implementation should strive to catch as much mistakes made by the programmer as possible, thus allow them to build better and more secure software. Although static typing makes a language more complex and harder to learn, it provides a safety net for the programmer which I believe is well worth the effort.

The Rust Language implements such a static typing system and offers new methods of catching errors, which in other languages result in a crash at runtime. In this article, I’d like to explain some of these methods.

Null Result

It is not uncommon for a subroutine to not be able to return a result because it has hit some kind of edge case. Think for example of looking up the index of an element in an array that does not contain said item, popping an element from an empty stack or accessing an element by its index on an array that is too small. How these are handled differs wildly across languages. The most common way of dealing with this seems to be either throwing an exception or returning null (or -1).

If you forget to check for -1 or null or catch the exception, your program will blow up. Rust however, has a way of making sure that errors are caught and handled at compile-time.

The Option Type

Rust’s standard library uses the Option for functions that must handle edge cases. It’s a so called enum type (like a tagged union if you are familiar with C) that has two possible values. This is its declaration:

enum Option < T > { None , Some ( T ), }

For example the Vec::pop method, that pops the last element from a vector like a stack, will return Some with the element when there was at least one element in the vector, or None if the vector was empty.

Now, getting the value out of an Option requires a match construct. We can’t just assert a value has been returned and use it like in languages that return a pointer. And that’s good! The programmer is forced to think about how to handle the case where None is returned. Where a similar piece of code in another language causes a runtime error, Rust will prevent the program from being compiled:

let mut numbers = vec ! [ 21 ]; let maybe_number = numbers . pop (); // Option<i32> println ! ( "{}" , maybe_number * 2 ); // Compilation error!

Which fails with error[E0369]: binary operation * cannot be applied to type std::option::Option<{integer}> .

A match must be used to tell whether there is a result:

let mut numbers = vec ! [ 21 ]; let maybe_number = numbers . pop (); if let Some ( my_number ) = maybe_number { println ! ( "{}" , my_number * 2 ); // Works now! } // We can also add an else block to handle the None case

The Result Type

Similarly to the Option type, there is also the Result type. Result is like Option , but instead of Some and None , it can be either Ok , containing the result of the function, or Err , which contains an error if something went wrong. This way of handling errors can best be compared to that of the Go language where errors are also returned as values. The critical difference is that unlike in Rust, both the result and a (null-able) error are returned. This means that it is possible to forget to check whether an error has been returned and use the result, resulting in a runtime error.

Rust does not have this trap because like Option , the Result enum must first be checked for their content. It is therefore not possible to misuse the result as if it were ok while an error has occurred.

Mandatory Initialization

Most other languages allow programmers to separate declaration and initialization of variables. The consequence of this when brought into practice is that programmers sometimes tend to forget to initialize such variables e.g. by branching. While there are languages that have a compiler that will abort compilation (Java) or issue a warning (C, C++), these will not prevent the programmer to silence the compiler by initializing variables to null or some zero value and cause a crash during runtime or worse, let the program do the wrong thing!

On uninitialized variable bindings, Rust will refuse to compile and thus prevent runtime errors:

let a : & str ; println ! ( "{}" , a . len ()); // Compilation error! a = "Hello" ;

Which fails with: error[E0381]: use of possibly uninitialized variable: *a .

Of course the null initialization trick is still possible, it now requires the use of the Option type which, as explained earlier, needs explicit handling for the None case.

If you can, initialize your bindings as soon as you declare them:

let a = "hello" ; println ! ( "{}" , a . len ());

Traits

Traits (similar to interfaces in other languages) can be described as abstract definitions for a type on which certain operations may be performed. For example, the Rust standard library defines an fmt::Display trait for types that can represent themselves as a string. Traits in a statically typed language like Rust can be used as constraints to for generic functions and types.

Consider the following function that writes a slice of integers to a file using their string representation:

fn write_list ( out : & mut fs :: File , numbers : & [ i32 ]) -> io :: Result < () > { for num in numbers { writeln ! ( out , "{}" , num ) ? ; } Ok (()) }

Simple, right? But what if we need to do the same with slices of unsigned integers? Strings? Floats? Custom types? We’d need to create a function for all types of those types!

But Rust can do that for us, using a type parameter:

fn write_list < W , T > ( mut out : W , things : & [ T ]) -> io :: Result < () > where W : io :: Write , T : fmt :: Display { for thing in things { writeln ! ( out , "{}" , thing ) ? ; } Ok (()) }

Now, this function accepts a slice of any type that implements the Display trait.

I’ve also also replaced the &mut fs::File with a type parameter, W , that must implement the io::Write trait. This makes it easier to write a unit test because instead of using the fs api and temp files, we can just use a vector because it implements io::Write . Note that the W parameter is owned rather than a reference ( &mut W ) because io::Write is also implemented for all mutable references to types that implement io::Write !

References

If you have programmed in C or C++ before you might know them as pointers; references, a handle to read and/or write a piece of memory without being responsible for it’s allocation. Although references in Rust differ from pointers in C because they can not be null.

Before we get to why references are needed and work the way they do in Rust, I’d like to explain the concept of ownership first.

Ownership

Like C, data of a Rust program lives in either the heap or the stack and is not taken care of by a garbage collector. The major difference between C and Rust is that Rust coerces a style of memory management in which one subroutine, the owner, is responsible for allocating and freeing a piece of memory when it’s no longer needed.

This style of memory management helps preventing forgetting to free memory, freeing it twice or using memory after it has been freed, mitigating nasty security bugs. It is built into the Rust language and substitutes a garbage collector as a compile-time safe way of managing memory.

Safe References

Functions in Rust operate on data via references. This is because a function that has to read or write something would otherwise have to own all the data it uses.

In Rust, there are two types of references:

Immutable references, &T , that allow read-only access to an object

, that allow read-only access to an object Mutable references, &mut T , that also allows the referred object to be modified (mutated)

References must adhere to a set of rules:

There can be multiple immutable references to the same object at a time There can be only one mutable reference to an object at a time Mutable and immutable references to a single object can not simultaneously exist A mutable reference can only be taken from an owned object that is also mutable

These rules exist help prevent bugs like concurrent modifications that occur when e.g. modifying an array that is being read:

Concurrent Modifications

ArrayList < String > list = new ArrayList < String >(); list . push ( "hello" ); list . push ( "world" ); for ( String item : list ) { if ( item . equals ( "hello" )) { list . add ( ", " ); // throws a ConcurrentModificationException! } }

In Rust, a piece of code similar to the above looks like this:

let mut list = vec ! [ "hello" , "world" ]; for item in list . iter () { if * item == "hello" { list . push ( ", " ); // Compilation error! } }

Both snippets contain the same mistake: an array is modified while it is being read. However in the top piece, the mistake is encountered at runtime while Rust is able to find it when compiling and throws error[E0502]: cannot borrow list as mutable because it is also borrowed as immutable . Here’s how Rust is able to find the mistake:

list.iter() creates an iterator for all the items in the vector. It does not take ownership of the vector, so it keeps an immutable reference ( &Vec ) to the vector to be able to access its contents. list.push() modifies the state of the vector and thus must acquire a mutable reference ( &mut Vec ) to it. But because the iterator already keeps an immutable reference while looping and push() must have a mutable reference, rule number 3 is violated causing compilation to be aborted.

Mixing up the order of operations

Mixing two or more operations that modify the same object can also cause bugs. For example, take this hypothetical piece of code that sends a file to some output over HTTP.

struct HttpHeaderWriter < W : io :: Write > { out : W , // ... } impl < W : io :: Write > HttpHeaderWriter < W > { fn finish ( self ) -> io :: Result < () > { // ... } // new(), content_type(), content_length() ... } fn send_file < W : io :: Write > ( mut out : W , mut file : fs :: File ) -> io :: Result < () > { let mut header = HttpHeaderWriter :: new ( & mut out ) ? ; header . content_type ( "application/octet-stream" ) ? ; header . content_length ( file . metadata () ? . len ()) ? ; io :: copy ( & mut file , & mut out ) ? ; header . finish () ? ; Ok (()) }

Which fails with: error[E0499]: cannot borrow out as mutable more than once at a time .

Here, HttpHeaderWriter is supposedly a simple wrapper around a writer to make writing the HTTP headers easier. It is given a mutable reference to out . After all headers are set, finish() is used to finish writing the header after which, the content can be sent.

The mistake in this snippet is that header.finish() is called after the body has been copied using io::copy() . Rust is able to detect this error because two mutable references to out are needed simultaneously, breaking rule number 2. One is held by the HttpHeaderWriter and the other is used in the call to io::copy() .

The fix is simple, we need to make sure that the lifetime of header is terminated before we attempt to copy the body. We can do that by limiting the lifetime of header by wrapping it in a block. Like so:

{ let mut header = HttpHeaderWriter :: new ( & mut out ) ? ; // .. header . finish () ? ; } io :: copy ( & mut file , & mut out ) ? ;

This ensures that it can no longer be used and also helps against accidentally sending more headers after the body has started.

Lifetimes

If you are a C programmer you will care about this awesome feature. In C, you can do this:

char * nth_char ( char * str , int n ) { if ( n >= strlen ( str )) { return 0 ; } return & str [ n ]; }

Which is a function that will return a pointer to the character at index n in string str if n does not exceeds the bounds of the string. However, when using the returned pointer, there is no way to determine whether the character it points to is still allocated. It is therefore possible to introduce a use-after-free bug.

Rust does not allow this and features a system to determine at compile-time whether referred objects outlive their references. This is what the Rust equivalent of the above C code looks like:

fn nth_byte < 'a > ( st : & 'a mut [ u8 ], n : usize ) -> Option <& 'a mut u8 > { if n >= st . len () { None } else { Some ( & mut st [ n ]) } }

Note: Getting a pointer to a character in a string in Rust is not possible, so I’ve used a byte slice instead of a string. Also, we could elide this whole function and just replace it with st.get_mut(n) .

Returning a reference requires specifying a lifetime so Rust knows how long the referred object will stay allocated. Rust will try to infer this from other references used as arguments, but I’ve added the lifetime so you can see how it works. 'a is the lifetime of the string st and is also used as lifetime of the returned reference requiring it to not outlive the string it is referring to.

Buh-bye dangling pointers!

Concurrency

Writing code that runs in multiple threads introduces a whole new set of bugs that could occur. The official Rust website boldly claims “threads without data races”. So how does it work?

Rust has two special traits that, when implemented by a type, are allowed special privileges by the compiler:

Types implementing the marker::Send trait allow it to be moved across thread boundaries.

marker::Sync means that references to a type implement Send. It implies that multiple threads can access an object without the need for an explicit synchronization mechanism, like a mutex.

The following code snippet shows an example where these traits prevent a very subtle bug from being compiled:

use std :: rc :: Rc ; use std :: thread ; fn main () { let value = Rc :: new ( 1 ); let shared_value = value . clone (); let handle = thread :: spawn ( move || { println ! ( "{}" , shared_value ); }); handle . join (). unwrap (); }

Which will fail with: error[E0277]: the trait bound std::rc::Rc<i32>: std::marker::Sync is not satisfied .

Rc is Rust’s reference counting container that does neither implement Send or Sync. The reason for this is that it is not safe to pass around instances of Rc across threads because the internal counter storing the number of instances does not use atomic operations.

The snippet could be fixed by using the atomic reference counter, Arc , instead. Of course, using atomic operations for reference counting adds a small overhead, which is why Rc also exists.

sync::Mutex

If a type does not implement Sync (or you share it via an Arc ) and you want one or more threads to modify it, you should use sync::Mutex . Mutexes in Rust differ from those in other languages (that I’ve used: C++, Go, Java) because they wrap around the data to be protected like a container, instead of being a loose construct besides the data. The advantage of this is that you are forced to lock the mutex when you want to access the data contained which means you can never forget and introduce a race condition.

Tips

At last, I’d like to offer some tips on how to effectively use the safety features that Rust has to offer. Because even though Rust provides a myriad of ways to prevent errors as explained in this article, there are still ways to screw up.

Don’t use unsafe code

Except of course when interfacing with C libraries.

Programs that are written in 100% Rust have no need for unsafe code. It might be tempting to get rid of that one branch and squeeze that tiny extra bit of performance out of your program, but ask yourself first, is it really worth it?

Values must always mean something

Like explained above in “Mandatory Initialization”, Rust will abort compilation when a variable binding that has no value is used. It’s tempting to silence the compiler by just initializing it to the zero value of whatever type you’re using.

For example:

let mut username = "" ; if foo { username = "polyfloyd" ; }

It does that default value make sense in the context of your program? Would someone else reading your code misinterpret the value?

Do this instead:

let username = if foo { "polyfloyd" } else { // The else clause is required for expressions that yield a value. // Does your default value still make sense to use here? // Or should you rather abort with an error? };

This style prompts you to think about what values to use. It even helps you remember if you forget the else clause and does not require the variable binding to be mutable. Win!

Don’t unwrap

Rust’s method of handling errors is to return the error as a value via a Result. It can be tempting to just unwrap() , get the value if any, and carry on instead of using a match pattern.

Whatever your opinion is on the practice of returning errors, this is the standard in Rust. A runtime panic indicates a bug in the program.

Conclusion and Beyond Rust

Rust’s implementation of static typing will prevent many mistakes that would otherwise cause runtime errors in other languages. From my personal experience, a program that compiles correctly works as intended.

The ability of Rust to detect faults in a program and its ease of use will only improve in the future. At the time of writing, further enhancements of the type system are on their way.

I’d like to finish with a language which introduces a new interesting way to reduce the risk of bugs by means of type checking:

Perl6

Although it’s a dynamic language, Perl6 offers a feature that allows programmers to create custom derivations of types which can only be only successfully initialized when some condition is met. This allows for very fine grained control over what a subroutine can accept as input. You could for example introduce an EvenInt , a NonEmptyList or even a String that must match some regular expression!

You can read more about Perl6 Subsets here.