Rust natively supports linking against C libraries and calling their functions directly. Of course, any function imported thus requires the unsafe keyword to actually call (because Rust can’t guarantee its invariants or correctness) but that’s an inconvenience we can punt until later.

The Rust Nomicon will tell you that you can import function definitions or other global symbols by declaring them in extern blocks, as long as the names and signatures line up exactly. This is technically correct but not all that helpful. Typing in function definitions by hand is completely stupid bonkers, and makes no sense when we have a perfectly good set of header files with the declarations in them. Instead, we’re going to use a tool to generate the Rust signatures from our library’s C header files. Then we’re going to run some test code to verify it’s working correctly, tweak things until it looks right, and finally bake the whole thing into a Rust crate. Let’s begin.

Bindgen

The most commonly used tool to generate Rust signatures from C headers is bindgen . Our goal is to create a bindings.rs file representing the library’s public API (its public functions, structs, enumerations, etc). We will configure our crate to include that file. Once the crate builds, we can then import that crate into any project to invoke our C library’s functions.

What you’ll need:

A functioning cargo setup. I assume if you’re compiling Rust code at all that you have this.

setup. I assume if you’re compiling Rust code at all that you have this. A working C compiler and pkg-config for dependency resolution.

for dependency resolution. Header file(s) corresponding to the library functions you want to use.

If you have the source code that’s great; this example assumes you are building the library from source. Otherwise you’ll need the path to the static or dynamic library you’re linking to, if it’s not in your system path.

An amount of patience corresponding to the size of the library’s API.

Installing the command-line bindgen tool is as simple as:

cargo install bindgen

On my Debian laptop I also needed to manually apt install clang as well, though your mileage may vary.

Setting up your crate

Our new library crate will contain the dirty business of building and exporting the native C library’s unsafe functions. Again, leave any safe wrappers for another crate — this not only speeds up compilation, but it also makes it possible for ̶m̶a̶s̶o̶c̶h̶i̶s̶t̶s̶ other crate authors to minimally import and use just the raw C bindings. The standard Rust naming convention for FFI crates is lib<XXXX>-sys .

We’re going to create a build.rs file that will be used with the cc crate to compile and link our bindgen exports. Let’s put our library source code in a subdirectory called src and our associated include files in a subdirectory called include . Next, let’s make sure our Cargo.toml is set up:

[package]

name = "libfoo-sys"

version = "0.1.0"

links = "foo"

build = "build.rs"

edition = "2018" [dependencies]

libc = "0.2" [build-dependencies]

cc = { version = "1.0", features = ["parallel"] }

pkg-config = "0.3"

Next we’ll populate the build.rs file. The following is going to look a bit weird — we are writing a Rust program that will output a script to stdout; cargo will directly use this script to build our crate.

If you’re linking against an already-compiled library guaranteed to be in the system path, your build.rs might be as simple as this:

fn main() {

println!("cargo:rustc-link-lib=foo");

}

Most of the time, though, you’ll want to at least use some sort of package configuration to ensure the library is actually installed and the linker can find it. In many cases, your library is small enough to be built as a static library by cargo itself. The pkg-config crate helps with library and dependency configuration, and cc handles the dirty work of building C code from within cargo. Both crates run configuration and build steps before they output the lines that cargo needs. In our example our source code uses zlib, so we use pkg-config to find and import an appropriate version. The sample code below also shows how to add compiler flags and preprocessor definitions.

fn main() {

pkg_config::Config::new()

.atleast_version("1.2")

.probe("z")

.unwrap(); let src = [

"src/file1.c",

"src/otherfile.c",

];

let mut builder = cc::Build::new();

let build = builder

.files(src.iter())

.include("include")

.flag("-Wno-unused-parameter")

.define("USE_ZLIB", None); build.compile("foo");

}

Finally, you will need a src/lib.rs file to actually compile our bindings. Here we will disable warnings for C naming conventions that don’t line up with Rust, and then just macro include our generated file:

#![allow(non_upper_case_globals)]

#![allow(non_camel_case_types)]

#![allow(non_snake_case)] use libc::*; include!("./bindings.rs");

Generating the bindings

While the bindgen user guide seems to guide you toward generating the bindings on the fly within build.rs , in practice you will need to edit the generated output before releasing it into a crate. Generating one or more files via the command line and committing the output to your repository will give you the most control.

The initial attempt at generation might look something like this:

bindgen include/foo_api.h -o src/bindings.rs

For a real header with more than a few API calls, this is unfortunately going to generate way more definitions than we want or need. The command line that generated part of the bindings.rs for our project at Dwelo wound up being something closer to this:

bindgen include/foo_api.h -o src/bindings.rs '.*' --whitelist-function '^foo_.*' --whitelist-var '^FOO_.*' -- -DUSE_ZLIB

Convincing the generator to give you only what’s necessary and not barf on undefined symbols is a trial and error process. Consider doing generation in stages and concatenating the results.

It’s powerful, but not perfect

When you pass a header to bindgen , it will invoke the Clang preprocessor and then greedily convert every symbol definition it can see. You will need to make adjustments at the command line, and refactor the resulting output.

Original Makefile/CMake extras

After the -- on the bindgen command line, you can add whatever flags you’d normally add to a compiler when building against the library. Sometimes these will be extra include paths, and sometimes they will be necessary when headers have #ifdef guarded definitions. For our vendor library, failing to define OS_LINUX hides a bunch of symbols we need. (What, did you think legacy code is going to use standard compiler defines like __linux__ instead of making things up? Sorry, comedy hour is down the hall and up the stairs.) If your generated output is mysteriously missing functions, check your defines.

Headers that include standard headers

Bindgen is very aggressive about generating definitions for every available symbol in the preprocessor output, even generating definitions for transitive system-specific dependencies that you do not want. This means if your header includes stddef.h or time.h (or includes another header that does) you will wind up with a bunch of extra crap in the generated output. It’s even worse when compiling C++ code, as C++ compilers apparently must export every symbol used from std even when it’s not necessary or desired.

Your crate should only expose what’s in the library API, not what happened to be in system header files or the standard library where you did your generation. This one is a pain, particularly if your library’s functions and constants don’t follow any kind of naming convention. The only way around this is with whitelist regex and lots of trial and error.

Preprocessor #defines

#define FOO_ANIMAL_UNDEFINED 0

#define FOO_ANIMAL_WALRUS 1

#define FOO_ANIMAL_DROP_BEAR 2 /* Argument should be one of FOO_ANIMAL_XXX */

void feed(uint8_t animal);

This looks contrived, but this is an obfuscated version of a pattern that’s pervasive through our vendored C library.

In C this works fine, because when you include the header into your source you can just use something like FOO_ANIMAL_WALRUS directly when a function calls for it. The C compiler will implicitly cast the literal 1 to uint8_t and the code works. Of course the original author should have created an enum typedef for clarity and used that, but they didn’t, and that’s still legal C code we have to deal with.

pub const FOO_ANIMAL_UNDEFINED: u32 = 0;

pub const FOO_ANIMAL_WALRUS: u32 = 1;

pub const FOO_ANIMAL_DROP_BEAR: u32 = 2; extern "C" {

pub fn feed(animal: u8);

}

Although bindgen is clever enough to recognize the symbols as constants, there are still a few issues. The first is that bindgen has to guess the type for each FOO_ANIMAL_XXX . It’s apparently guessed u32 in this case (which not only doesn’t match our function parameter, but is also technically wrong). This leads to the other issue: Rust will require us to explicitly cast FOO_ANIMAL_WALRUS to u8 when calling feed . Not very ergonomic, is it? To fix this, we need to change the types on the generated consts to match the function definition. We’ll fix the enumeration issue later in the safe wrapper.

Some structs should just be opaque

Our vendored library passes a pointer to a context object for nearly every function other than initialization. (Let’s call it foo_ctx_t for now.) This is a widely-used pattern and perfectly reasonable. But because of an implementation flaw our header file defines foo_ctx_t instead of forward-declaring it. This unfortunately leaks the internals of foo_ctx_t . That leak then transitively forces us to know and define a bunch of other dependent types we don’t care about.

Rust doesn’t really allow separate declaration and definition for structs. Unlike C, we can’t just declare foo_ctx_t in Rust without providing a definition for it, and the Rust compiler has to recognize the name foo_ctx_t in order to use a pointer to it as a function arg. But we can use workarounds to avoid having to define it completely. Neither of them are perfect, but as of this writing there are two alternatives that are at least functional in practice.

We can replace the struct definition with an enumeration type that has no variants, which conveniently will give you a compile error if you accidentally try to construct it or use it as anything but a pointer target. This makes type purists upset because we’re technically lying to the compiler, but it does work:

pub enum foo_ctx_t {}

Or we can replace its innards with a private zero-size type field. This is what bindgen does by default, and it’s fine as long as you don’t rely on mem::size_of :

pub struct foo_ctx_t {

_unused: [u8; 0],

}

Const-correctness

Bindgen will convert C const pointers into Rust const * and undecorated C pointers into mut * . If the original code was const correct, this works out just fine. If not, it can cause headaches later on when trying to create safe wrappers. Fix the library, if possible.

The example below can be easily used inside a Rust unsafe block with a normal (immutable) reference to time_t and a mutable reference to tm :

// Generated from <time.h>

extern "C" {

pub fn gmtime_r(_t: *const time_t, _tp: *mut tm) -> *mut tm;

}

You don’t technically have to modify the C library to change a pointer to const * in an extern Rust definition. In fact, the symbol table for C libraries doesn’t even have a parameter list, so Rust’s linker has no way to confirm your function parameters are correct at all (this is not the case for C++ symbols, thankfully). If you do modify the Rust pointer types, you are responsible for verifying that the invariants for const pointers are in fact correct for the library.

Sharp edges