Adding enumerations

In the previous post, we discussed an example function that took an enumerated type as an argument. Here’s the cleaned up bindings.rs code:

pub const FOO_ANIMAL_UNDEFINED: u8 = 0;

pub const FOO_ANIMAL_WALRUS: u8 = 1;

pub const FOO_ANIMAL_DROP_BEAR: u8 = 2; extern "C" {

/// Argument should be one of FOO_ANIMAL_XXX

pub fn feed(animal: u8);

}

Wait a sec. The function signature says it takes a u8 , but the documentation says the parameter should only be one of FOO_ANIMAL_XXX (let’s assume for the sake of our own sanity that it can safely handle the Undefined case). If we let our safe code run with arbitrary u8 as input that’s not only confusing but also potentially dangerous. Sounds like our safe wrapper should take an Animal enum and convert it.

We could write this enumeration by hand. But let’s use the enum_primitive crate to give us some extra flexibility. (I’ve omitted the rustdoc strings for brevity, though you should include them in real code):

use enum_primitive::*; enum_from_primitive! {

#[derive(Debug, Copy, Clone, PartialEq)]

#[repr(u8)]

pub enum Animal {

Undefined = FOO_ANIMAL_UNDEFINED,

Walrus = FOO_ANIMAL_WALRUS,

DropBear = FOO_ANIMAL_DROP_BEAR,

}

}

Because we tagged the struct as “representable as a u8 ” a simple cast is sufficient to convert an Animal into a u8 . Now we can write our safe wrapper like so:

pub fn feed(animal: Animal) {

unsafe { libfoo_sys::feed(animal as u8) }

}

The enum_primitive crate also gives us some helpful boilerplate functions for converting a u8 to an Option<Animal> . This might be necessary if a function returns a u8 value that actually should be treated as an enumerated type. There’s a catch: the conversion from a numeric type can fail if the number supplied doesn’t match an enumeration value. It’s up to you whether your code unwraps immediately and panics, replaces the None with a default value (“Unknown” can be used if it’s there, or added if it’s not), or simply returns the Option and lets the caller deal with it.

Initializers that return pointers

Rather than pull examples out of thin air, for the next few examples I’m going to use a rather well-known and super gnarly library as an example: OpenSSL. (Please don’t implement bindings yourself for OpenSSL; someone has already done it, and better. This is just a hopefully familiar example.)

Before you can encrypt or decrypt data with OpenSSL, you first need to call a function that allocates and initializes some context. In our example this initialization is performed by a call to SSL_CTX_new . Each function that does any work takes a pointer to this context. When we’re done using this context things need to be cleaned up and the context data needs to be destroyed using SSL_CTX_free .

We’re going to create a struct to wrap this context’s lifetime. We’ll add a function called new that does the initialization for us and returns this struct. All of the C library functions that would require the context pointer will be wrapped as Rust functions taking &self and implemented on the struct. Finally, when our struct falls out of scope we want Rust to automatically clean up for us. Hopefully this should be a familiar software pattern: it’s RAII.

Our example might look something like this:

use failure::{bail, Error};

use openssl_sys as ffi; pub struct OpenSSL {

// This pointer must never be allowed to leave the struct

ctx: *mut ffi::SSL_CTX,

} impl OpenSSL {

pub fn new() -> Result<Self, Error> {

let method = unsafe { ffi::TLS_method() };

// Manually handle null pointer returns

if method.is_null() {

bail!("TLS_method() failed");

} let ctx = unsafe { ffi::SSL_CTX_new(method) };

// Manually handle null pointer returns here

if ctx.is_null() {

bail!("SSL_CTX_new() failed");

} Ok(OpenSSL { ctx })

}

}

I’m namespacing the C library calls behind ffi to make it a bit more clear what we’re importing versus what we’re defining in the wrapper. I’m also cheating a bit and using bail from the failure crate — in real code you’d want to define an error type and use it. And yes, it looks a bit gross because we don’t have the niceties of unwrapping Option types from our returns. We have to manually check everything.

Remember: wrapping unsafe functions implies you’re doing the hard work of validating null pointers and checking for errors. This is exactly the sort of thing your wrapper MUST handle correctly. A panic early on is far better than silently passing around null or invalid pointers. We also can’t allow the ctx pointer to be copied out of the struct, because we can only guarantee it will be valid while our struct still exists.

Destructors via impl Drop

The other end is cleanup. Destructors in Rust are handled via the Drop trait. We can implement Drop for our struct so that Rust properly destroys the handle for us:

impl Drop for OpenSSL {

fn drop(&mut self) {

unsafe { ffi::SSL_CTX_free(self.ctx) }

}

}

Rust also prevents drop from being called directly or being invoked twice, so you don’t have to play tricks like manually nulling out ctx after freeing it. Also, unlike C++ the destructor won’t ever get invisibly called because of invisible copies being created and deleted.

Send and Sync

So now you have a struct that contains a pointer element. But by default Rust will put some restrictions on how your struct can be used in a threaded context. Why does the language do this, and why does this matter?

By default, Rust assumes raw pointers cannot be moved between threads ( !Send ) and cannot be shared among threads ( !Sync ). And because your struct contains a raw pointer, transitively it’s neither Send nor Sync . This conservative assumption helps keep external C code from stomping all over those lovely thread safety guarantees that Rust gives us.

If your object isn’t Send , you’re very restricted in what you can do with it in a threaded program — there’s no way to even wrap it in a Mutex and pass references between threads. But maybe external documentation or clever inspection of the source code indicates that the returned context pointer is safe to move between threads. It may also indicate whether functions using this context pointer are safe to use in threaded contexts — i.e. the functions themselves are thread-safe. Rust isn’t able to make these determinations for you, because it can’t see what your library does with these pointers.

If you can make the assertion that every single use of your (internally private!) pointer obeys either of these rules, you can flat out tell Rust so. Correctly making this kind of assertion is difficult if not dangerous, and to instill in you the appropriate amount of fear Rust requires you to use the unsafe keyword.

unsafe impl Send for MyStruct {}

unsafe impl Sync for MyStruct {}

Assuming you don’t allow outside access to the pointer somehow (via accessor methods or by marking the struct member pub ) then you are probably safe to do the following if you can make these assertions:

You can mark your struct Send if the C code dereferencing the pointer never uses thread-local storage or thread-local locking. This happens to be true for many libraries.

if the C code dereferencing the pointer never uses thread-local storage or thread-local locking. This happens to be true for many libraries. You can mark your struct Sync if all C code able to dereference the pointer always dereferences in a thread-safe manner, i.e. consistent with safe Rust. Most libraries that obey this rule will tell you so in the documentation, and they internally guard every library call with a mutex.

Functions that return pointers

So let’s assume we’ve got our struct set up with new and drop implementations. We’re happily churning through the list of functions that take this context pointer, and for each one we want to expose we’re implementing a safe version against our struct that takes &self . Then we run into something like this (fictional for simplicity, but not far off):

// Always returns valid data, never fails

SSL_CIPHER *SSL_CTX_get_cipher(const SSL_CTX *ctx);

We obviously don’t want to return raw pointers from our wrapper, that’s not very ergonomic. The whole point of this is to make sure library users don’t have to use unsafe .

After reading the documentation we discover that SSL_CIPHER is a struct, and the pointer returned is valid as long as our SSL_CTX isn’t freed. Hey, that kinda sounds like a lifetime bound. So our first approach might look like this:

pub fn get_cipher(&self) -> &ffi::SSL_CIPHER {

unsafe {

let cipher = ffi:: SSL_CTX_get_cipher(self.ctx);

// Dereference the pointer, then turn it into a reference.

// Remember: derefing a pointer is unsafe!

&*cipher

}

}

Dereferencing and then immediately taking the address of a pointer creates what’s called an unbounded lifetime. This isn’t what we want, so we immediately constrain the lifetime via the return type. We don’t explicitly specify a lifetime, but let’s recall the rules for lifetime elision from the Rust handbook. The lifetime of the return value, in this case, will be constrained by default to be the same as the lifetime for &self . That’s a sane bound, so this implementation looks safe.

But we can go further. That SSL_CIPHER is typically used as a context pointer with its own associated functions. As is, having our safe code return a reference to a C struct isn’t ergonomic at all. What we want to return is a Rust struct with its own associated behavior matching the C library. But we also should retain the lifetime association: “This cipher object is only valid as long as the OpenSSL object you got it from is still alive.”

So let’s assume we go through the work of creating a Cipher struct to wrap that pointer, and we want to tell Rust that the struct has some sort of lifetime that depends on our OpenSSL object:

pub fn get_cipher<'a>(&'a self) -> Cipher<'a> {

unsafe {

let cipher = ffi:: SSL_CTX_get_cipher(self.ctx);

Cipher::from(&self, cipher)

}

} // Something is missing here...

pub struct Cipher<'a> {

cipher: *const ffi::SSL_CIPHER,

} fn from<'a>(_: &'a OpenSSL, cipher: *const ffi::SSL_CIPHER)

-> Cipher<'a> {

Cipher { cipher }

}

Unfortunately this won’t compile, because Rust says “hey, you declared a lifetime associated with your struct, but it’s not used anywhere!” So we need to declare somehow that yes, the internals depend on a reference we can’t immediately see.

use std::marker::PhantomData; pub struct Cipher<'a> {

cipher: *const ffi::SSL_CIPHER,

phantom: PhantomData<&'a OpenSSL>,

} fn from<'a>(_: &'a OpenSSL, cipher: *const ffi::SSL_CIPHER)

-> Cipher<'a> {

Cipher { cipher, phantom: PhantomData }

}

You can think of this as saying to the compiler, “Treat this struct as if it contains a reference to an OpenSSL , with lifetime 'a ”. Where does that lifetime come from? We provide it when we call our from with a reference &self .

PhantomData doesn’t actually take up any space, and it disappears in compiled code. But it allows the compiler to reason about lifetime correctness. Now our wrapper users can’t accidentally hold onto a Cipher after freeing its parent OpenSSL .

Functions that may return errors

Consider the following C function:

int foo_get_widget(const foo_ctx_t*, widget_struct*);

We’re expected to pass in a pointer, which the function will fill in. If this function returns 0 everything is fine, and we can trust the output was populated correctly. Otherwise, we need to return an error.

It’s far more ergonomic to return an owned struct with the data rather than demand that Rust callers create a mutable struct and pass a mut reference (though you can provide both if it makes sense to do so).

In the following examples I’m assuming the custom error type is defined elsewhere, and allows conversion from the appropriate types.

use std::mem::MaybeUninit; pub fn get_widget(&self) -> Result<widget_struct, GetError> {

let mut widget = MaybeUninit::uninit();

unsafe {

match foo_get_widget(self.context, widget.as_mut_ptr()) {

0 => Ok(widget.assume_init()),

x => Err(GetError::from(x)),

}

}

}

Ed: thanks to reddit /u/Cocalus for pointing out that mem::uninitialized() is deprecated. Hopefully I got the fix right!