Here are my notes on adding fallible allocation to Rust’s collection types, including several interviews with stakeholders or people with experience in the problem space.

This document doesn’t propose a concrete solution, but instead lays out the options and my notes on them. I have a pretty strong bias going in, and it was reinforced by the interviews.

I’ll be prepping an RFC this week based on this feedback and the discussion here.

Background

Methods like push on Vec don’t provide any way for the caller to handle OOM, and this is problematic for our several users. Notably gecko devs will be forking our collections to add this functionality for Firefox 57 (November 2017). As such we should fast track a design so, at very least, their fork matches what std ends up doing eventually.

Note: I will be usually be using push as a stand-in for "all allocating APIs" for simplicity here.

Today we always abort (call Alloc::oom() ) Unless computing the allocation size overflows isize::MAX (yes isize), then it panics Both of these are in the implementation of Vec, and not done by the allocator

Moving to unwinding on OOM is problematic unsafe code may be exception-unsafe on the basis of current strategy almost everyone tells me that C++'s version of this ( bad_alloc ) is bad and disliked libunwind allocates, making panic on OOM sketchy at best Alloc::oom says it can panic, so local or custom global allocators might unwind…? Seems sketchy… should we update docs to forbid this? Was this discussed in the RFCs? pnkfelix and sfackler seem to think it was only intended for local allocators global allocators aren’t currently distinguished by the Alloc trait this + catch_panic is a “simple” way to get fallible allocation out of the existing APIs still unstable APIs anyway, so won’t help gecko devs for 57 unwinding isn’t always supported by our stake holders (no unwinding in gecko)



For all considered proposals, I’ll be using a strawman error type, which I’ve only thought about a little:

enum AllocFailure { Exhausted, CapacityOverflow }

Note that the allocator API defines:

pub enum AllocErr { Exhausted { request: Layout }, Unsupported { details: &'static str }, }

I don’t think Unsupported should be exposed (it will instead be folded into Exhausted).

should be exposed (it will instead be folded into Exhausted). I don’t think Layout should be exposed (it’s an implementation detail).

Most consumers will probably just do:

foo.try_push(x)?; // evaporate details // or if foo.try_push(x).is_err() { /* do something */ }

But some might be interested in reproducing Vec’s behaviour (including Vec itself):

match foo.try_push(x) { Exhausted => Allocator::oom(), CapacityOverflow => panic!("capacity overflow"), }

Felix notes Exhausted having requestedBytes: usize might be useful for debugging crashes – was it “real” oom or did we try to allocate something massive?

Major contenders

Types to distinguish fallibility FallibleVec<T> , replaces push(T) with push(T) -> Result<(), (T, AllocFailure)> doesn’t support generic use of Vec/FallibleVec hard to do mixed usage of fallible and non-fallible or at least, outside allocating code, fallibility loses relevance Vec<T, F: Fallibility=Infallible> , makes push(T) -> F::Result<(), T> requires generic associated types (stable late 2018, optimistically) probably requires type defaults to be improved? works with generics, but makes all of our signatures in rustdoc hellish maybe needs “rustdoc lies and applies defaults” feature

Methods to distinguish fallibility Make mirrors of all methods – try_push(T) -> Result<(), (T, AllocFailure)> works fine, but people aren’t happy about lots of methods Only add try_reserve() -> Result<(), AllocFailure> minimal impact methods like extend/splice have unpredictable allocations doesn’t work with portability lints (see below) might be nice to have anyway? Add some methods, but ignore niche ones Weird, going to make people mad

Middle ground: method to temporarily change type as_fallible(&'a mut self) -> FallibleVec<'a, T> can do it for one method: vec.as_fallible().push(x) or for a whole scope: let mut vec = vec.as_fallible() doesn’t enable generic use, weak for library interop can be built on method style note: this is different from type-based b/c a lifetime is involved



Possible augmentation: negative portability lints

In some sense “don’t use infallible allocation” is the same kind of constraint that kernel code has for “don’t use floats”. The latter is intended to be handled by negative portability lints, so we can do that too.

portability lints were spec’d here: https://github.com/rust-lang/rfcs/blob/master/text/1868-portability-lint.md

But the negative version (removing portability assumptions) was left as future work.

Strawman syntax – add maybe as a cfg selector in analogy to ?Sized :

// In liballoc impl<T> Vec<T> { // No need to mark push, implicitly #[cfg(std)] ? fn push(elem: T) { ... } // Say try_push doesn't infallibly allocate -- forces verification of body #[cfg(maybe(infallible_allocation))] fn try_push(elem: T) -> Result<(), AllocFailure> { ... } } // In your code #![cfg(maybe(infallible_allocation))] /* a bunch of functions/types that shouldn't use infallible allocation */ // or (equivalent) #[cfg(maybe(infallible_alloction))] mod allocation_sensitive_task; // or (more granular) #[cfg(maybe(infallible_allocation))] fn process_task() { /* will get warning if any function called isn't #[cfg(maybe(infallible_allocation))] */ }

Note this analysis is local, so if you call any undecorated function from a third-party library, you’ll get a warning. This is a bit annoying, but strictly correct insofar as longterm stability is concerned: they should publicly declare that they guarantee this. In this vein, adding a #[cfg(maybe)] from a public item isn’t a breaking change, but removing one is.

This will also require a ton of careful stdlib decorating (don’t want to promise things we shouldn’t).

Interviews

I interviewed several people with industry experience in this problem, only some stakeholders in Rust providing this API (noted here).

Interview with Ehsan (Gecko dev; doesn’t use Rust for it):

Gecko has fallible allocation in its standard collection types. Distinction can be done at the type level or method level – there are factions that disagree on the right approach, and the issue doesn’t appear to be settled?

Personally prefers methods

Almost all allocations in gecko are infallible; crashing is simple and maintainable (especially with multi-process!)

Will fallibly allocate for some key things to improve reliability. Notably when website can create allocation disproportionate in size to network traffic (image size is a few bytes).

Doesn’t need to handle all fallible allocation in that region of code, or even on that buffer happy to crash if the going gets tough. In quick search of gecko, [^1] couldn’t find any actual mixed use Except a sketchy pattern [^2]

Fallibility is a maintenance/safety hazard! Many untested branches. In a quick search of gecko, I found a few cases that are written in a confusing way



(last two points are why methods are preferred)

[^1]: https://searchfox.org/mozilla-central/search?q=%5B%5En%5Dfallible%5C)&case=false®exp=true&path==

[^2]:

// Fallibly reserve space if (!aOutput.SetCapacity(aSource.Length(), fallible)) { return false; } for (auto x : aSource) { // Says fallible, but this is actually infallible; otherwise this is UB on OOM *aOutput.AppendElement(fallible) = x; }

In rust this would probably just be output.try_extend(source)? , although FFI might make you write code like above?

Interview with Whitequark (embedded dev; uses Rust for it):

Three lines of defense against the specter of allocation:

First: statically allocate; much harder to mess up.

Second: Crash on oom! Usually hard abort (need to know how to recover anyway), but sometimes unwind (some Cortex-M devs) unwinding isn’t commonly supported here, so unwinding won’t ever be a complete solution.

Third: actually handle oom. fail at a task/request granularity all allocations for task are in a pool, so that on failure we free the whole pool; avoid fragmentation all allocations in this region of code are handled fallibly, no mixing strategies



Likes try_push, but wants #[deny(infallible_allocation)]

If we do a typed approach, would prefer something generic for library compat.

Fallible allocation is a last resort, and devs are willing to put in work to use it properly.

Interview with Swgillespie (CLR dev, works on the GC; doesn’t use Rust for it)

Need collections for state in GC traces, e.g. stacks in graph traversal. If allocation fails, can try to shrink the stack and retry. OOMing while trying to GC is a bug.

Uses global allocator (new with std::nothrow)

Would use #[deny(infallible_allocations)]

No preference on typed vs untyped.

No need for being generic over fallibility (GCs are fairly concretely typed)

No concern with interop with third-parties

Lots of bugs from missing spots or failing to check results

Interview with nox (Servo dev; uses Rust for it)

Stylo needs it for Firefox 57, will be forking libstd collections until we provide these APIs.

Code like this which parses a list should be fallible: https://github.com/servo/servo/blob/de0ee6cebfcaad720cd3568b19d2992349c8825c/components/style_traits/values.rs#L251

Style sheet should just come out as “couldn’t parse”/“didn’t load” when this happens.

Prefers methods to integrate into existing code where desired

Moving to infallible likely to be incremental, as it’s a big job

Controls all the relevant libraries

Doesn’t care about generics

Would like #[deny(infallible_allocations)] , not super important though

Relevant Reading