Feature Name: arbitrary_bytes_safe_trait

Start Date: (fill me in with today’s date, YYYY-MM-DD)

RFC PR: (leave this empty)

Rust Issue: (leave this empty)

Summary

Introduce an unsafe marker trait that guarantees that any byte sequence of size_of::<T>() bytes is a valid instance of T , and a derive for that trait.

Motivation

In deserializing input from untrusted sources (IPC, disk, network, etc), it is sometimes desirable to be able to interpret a sequence of bytes as a particular type. In general, this is unsafe, but for a large subset of types, it is safe. Currently, code that wishes to do this needs to be unsafe. It would be a big ergonomics win if the users and authors of deserialization APIs could deserialize such types without needing to reason about memory safety themselves.

Guide-level Explanation

Introduce a marker trait, pub unsafe trait ArbitraryBytesSafe {} (name to be bikeshedded) that indicates that any sequence of size_of::<T>() bytes is a valid instance of T .

Provide four (safe!) associated functions with default impsl on ArbitraryBytesSafe :

fn transmute_ref<T>(t: &T) -> &Self

fn transmute_mut<T: ArbitraryBytesSafe>(t: &mut T) -> &mut Self

fn transmute_slice_ref<T>(slc: &[T]) -> &Self

fn transmute_slice_mut<T: ArbitraryBytesSafe>(slc: &mut [T]) -> &mut Self

The first two functions ensure at compile time that size_of::<T>() == size_of::<Self>() . The second two functions check at runtime that slc.len() * size_of::<T>() == size_of::<Self>() .

Provide a custom derive for ArbitraryBytesSafe .

Reference-level Explanation

TODO: Explain how transmute_XXX is implemented.

Note that, when obtaining an immutable reference, T (the source type) doesn’t need to be ArbitraryBytesSafe because it isn’t mutated. When obtaining a mutable reference, it does need to be ArbitraryBytesSafe because there’s no guarantee that valid instances of Self correspond to the bits of valid instances of T without this trait bound.

The custom derive uses the following rules to determine whether a type is ArbitraryBytesSafe :

The primitives uXXX and iXXX are safe, including usize and isize .

and are safe, including and . Arrays are safe if their element types are safe.

Structs and tuple structs are safe if all of their fields are safe and if it is guaranteed that there is no internal padding. This is true if repr(packed) is used or if repr(C) is used and the alignment of all fields is met without needing any padding (e.g., if a u32 follows two u16 s). For tuple structs, this is also true if repr(transparent) is used. Anonymous tuples are never safe because there is no way to specify a repr on them.

is used or if is used and the alignment of all fields is met without needing any padding (e.g., if a follows two s). For tuple structs, this is also true if is used. Anonymous tuples are never safe because there is no way to specify a on them. Enums are safe if they are C-like, have either repr(C) or repr(i*) / repr(u*) , and if every possible discriminant value corresponds to a variant.

Rationale and Alternatives

This API allows for clean, safe serialization APIs (that don’t require unsafe impls internally) such as:

For reading from a stream, fn read<T: ArbitraryBytesSafe>(&mut self) -> T

For zero-copy parsing of input, fn parse<'a>(input: &'a [u8]) -> Result<Parsed<'a>, ParseErr>

It also has uses beyond deserializing untrusted input such as:

Generating random instances of things for testing

Debugging/inspecting arbitrary regions of program memory

Alternatives:

Do not have ArbitraryBytesSafe::transmute_XXX , and simply allow unsafe code to use ArbitraryBytesSafe as a marker trait that provides it the guarantees it needs to do unsafe things such as create an uninitialized T and then copy bytes into it manually. Downside: you lose the guarantee that the sizes match.

, and simply allow unsafe code to use as a marker trait that provides it the guarantees it needs to do unsafe things such as create an uninitialized and then copy bytes into it manually. Downside: you lose the guarantee that the sizes match. Instead of a custom derive, use a normal macro that is placed around the type definition and can decide whether or not to emit unsafe impl ArbitraryBytesSafe for <type> {} . This is less ergonomic, but also easier to implement as a first pass.

Prior Art

Thanks to @dtolnay for pointing out the Pod trait, which is similar to ArbitraryBytesSafe , although it performs more checks at runtime (in particular, size and alignment checking). It also provides a large number of utility functions available to Pod types.

The recent FromBits / IntoBits pre-RFC dicusses a different but related case - conversions between types in which, while arbitrary bit patterns may not be safe, all of the valid bit patterns of one type correspond to valid bit patterns of the other type, and so the conversion is nonetheless safe.

Unresolved Questions