Rust FFI: Sending strings to the outside world

Foreign Function Interface (FFI) is the most important feature of Rust to let it live peacefully with the other part of the world such as C/C++, Ruby, Python, Node.js, C#,...

The official FFI document is much more improved from before but it still not satisfied the people who want to dig deeper into FFI.

Another better source to learn about FFI is The Rust FFI Omnibus, which is a collection of examples show you how to use it with many languages.

I'm quite confused while reading about how to work with String in these resources, so I decided to write a new post just to talk about this topic, focusing on sending strings to other languages using FFI.

Sending out a String

First, we need to understand the Rust's String . Let's build a simple function that returns a String :

#[no_mangle] pub extern fn string_from_rust() -> String { "Hello World".to_string() }

And the Node.js code to read it:

const ffi = require('ffi'); // The path should be 'rstring/target/debug/librstring.so' on Linux environment let lib = ffi.Library('rstring/target/debug/librstring.dylib', { string_from_rust: ['string', []] }); let result = lib.string_from_rust(); console.log(result);

Run this code and what you will see is:

$ node ffi.js [1] 63179 segmentation fault node ffi.js

Crashed! But why?

The reason is simple. String is a type that only being implemented in Rust, the other languages (Node.js in this case) does not have anything like std::string::String . So it couldn't read that returned value from Rust.

String must be returned as a pointer

With particular data types that only available in Rust such as String or Vector , we should send it out as a Pointer to a memory block that holds its value.

Let's slightly modify our Rust code:

#[no_mangle] pub extern fn string_from_rust() -> *const u8 { "Hello World".as_ptr() }

*const u8 is the type of a Pointer .

Run the Node.js code again, and this is what you got:

$ node ffi.js Hello WorldHello World

Ehh, guys, we got good news and bad news here...

The good news is we can see the String now. The bad news is it doesn't look right.

The NUL-terminated strings

In Rust, a String is not NUL-terminated (not ending with \0 ), but strings in the others languages do. In this case, Node.js doesn't know where is the end of the text we want to get.

Oh, and speaking of NUL-terminated string, there is a guy who broke the whole Rust ecosystem on Windows last April with his NUL-terminated string generating crate.

Solution? Just insert the \0 at the end of our String .

#[no_mangle] pub extern fn string_from_rust() -> *const u8 { "Hello World\0".as_ptr() }

Run it again, looks good now:

$ node ffi.js Hello World

OK. But wait, do I have to put the \0 character all the time when I want to work with String? Well, not actually, Rust also provided a C-compatible string type called std::ffi::CString .

You can easily create a CString from a string slice:

CString::new("Hello World").unwrap()

Let's see how will we use CString to send a string to Node.js:

#[no_mangle] pub extern fn string_from_rust() -> *const c_char { let s = CString::new("Hello World").unwrap(); s.as_ptr() }

First, we create a CString from a string slice, then we return a pointer to its value, just like we do previously.

$ node ffi.js

Oops! Nothing displayed. Why?

std::mem::forget it to keep it

Rust is smart, in this case, too smart. It automatically frees up the memory blocks of any variable that being out of its scope.

Take a closer look at our string_from_rust() function. We created a CString , then return a pointer to the memory blocks that holding its value, then what? We going out of the scope of string_from_rust() function, that mean, s is now out of scope. So, Rust do its job, killing the s !

pub extern fn string_from_rust() -> *const c_char { let s = CString::new("Hello World").unwrap(); <---. s.as_ptr() | The scope of s } <---------------------------------------------------'

In the Node.js application, we received the pointer of s , which pointed to a freed memory blocks. That's why we see nothing.

So how do we tell Rust not to free up the memory of our string?

We use std::mem::forget ! The usage is simple:

#[no_mangle] pub extern fn string_from_rust() -> *const c_char { let s = CString::new("Hello World").unwrap(); let p = s.as_ptr(); std::mem::forget(s); p }

First, we store the Pointer of s string in a variable ( p ).

Then we use std::mem::forget to release it from the responsibility of Rust.

The string now leaked out. And Node.js now able to read its value:

$ node ffi.js Hello World

Sending out a Vector of String

Sometimes, sending out just a String is not enough, you need to send a bunch of String s.

What we learned from the previous section is we need to send a String as a NUL-terminated string, such as String + \0 or CString .

Vector are resizeable array, and it's also one of the particular types that only available in Rust. That mean, we need to return it as a Pointer . So what we will have here is a Pointer to a Pointer of String . This is quite similar to C's array.

#[no_mangle] pub extern fn string_array() -> *const *const u8 { let v = vec![ "Hello\0".as_ptr(), "World\0".as_ptr() ]; v.as_ptr() }

On Node.js side, we need to use ref-array package from npm to implement the Array from the returned Buffer .

const ffi = require('ffi'); const array = require('ref-array'); const StringArray = array('string'); let lib = ffi.Library('rstring/target/debug/librstring.so', { string_array: [StringArray, []] }); let b = lib.string_array(); b.length = 2; console.log(b);

We defined a new data type in Node.js, called StringArray , with the help of ref-array to convert the Buffer data into an array of string .

const StringArray = array('string');

And because it's an Array , we need to have the fixed size. So we need to specify the length of an array to make it readable.

Like this:

$ node ffi.js [ '��\u0002\u0002', '8+���~', buffer: <Buffer > ]

Otherwise, you will just get the Buffer without knowing its content.

$ node ffi.js [ buffer: <Buffer> ]

Oh wait! What? Why the weird strings?

Remember the std::mem::forget ? We got the same issue here. Rust also deallocated the vector v when it exit the string_array() function. So we need to forget it.

#[no_mangle] pub extern fn string_array() -> *const *const u8 { let v = vec![ "Hello\0".as_ptr(), "World\0".as_ptr() ]; let p = v.as_ptr(); std::mem::forget(v); p }

Now it's fine:

$ node ffi.js [ 'Hello', 'World', buffer: <Buffer> ]

Playing with std::mem::forget and leaking out memory is undesirable and we should not overuse it.

Many people suggest that in production, we should not do all these things by hand, it's a better idea to utilizing existing projects such as Neon from Dave Herman, the head of Mozilla Research. I totally agree with that. He loses a lot of his hairs for this, so we don't need to lose ours, jk.

I hope that reading this post would be as helpful for you as writing it was for me. Any feedback would be greatly appreciated.

Please feel free to leave a comment on my HackerNews and Reddit post.

Hey, thanks to HN and Reddit peoples, there are a lot of feedbacks. I have updated some parts in this post, and I will keep updating. Thank you so much!