[rust-dev] Abandoning segmented stacks in Rust

Greetings you delightful Rustic people, Sound the death knell for segmented stacks. The new runtime does not implement segmented stacks and it never will. This decision has been brewing for some time, and now I'd like to make it official and lay out the reasoning for posterity. To recap, segmented stacks are a strategy for growing stacks incrementally, starting from a single small stack segment, and appending further segments as needed in a linked list as the stack grows. This allows threads to be very compact so that many can be created concurrently. The previous Rust runtime did implement segmented stacks, and that experience makes us believe that the performance tradeoffs required are not compatible with Rust's goals as a high performance systems language. The performance problems associated with segmented stacks can be summerized as: switching stacks has a cost. Though that cost can be minimized through extensive tuning and micro-optimization (much of which we admittedly did not pursue), it will never be free, and we've concluded that the effort and complexity of continuing down that route is not justified. We've seen the problem manifest primarily in three areas: * "stack thrashing" - when out of stack a function call will force an allocation of a new segment, which is later freed upon return. This is expensive even when the new stack segmented is cached. The result is unpredictable and severe drops in performance whenever a stack boundary happens to fall inside a tight loop. ([Similar concerns][1] are pushing Go away from a segmented stack scheme as well - they call it the "hot split" problem). [1]: https://docs.google.com/document/d/1wAaf1rYoM4S4gtnPh0zOlGzWtrZFQ5suE8qr2sD8uWQ/pub * FFI - foreign code typically expects to have large stacks, so using the FFI often requires switching stacks. Avoiding this overhead would require an elaborate and inherently unsafe system of annotation (#8822), increasing a burden on the FFI interface. * LLVM libc optimizations and intrinsic fallbacks - LLVM will transform calls to some libc functions into intrinsics and some intrinsics into runtime function calls, for reasons of performance and platform compatibility. Obvious solutions to making these compatible with segmented stacks impose a high minimum stack requirement, partially defeating the point of a segmented stack. Instead of segmented stacks we're going to rely on the OS and MMU to help us map pages lazily. Although the details aren't clear yet, I expect that on 64-bit platforms the number of concurrent tasks will be comparable to using segmented stacks. On 32-bit platforms, with their limited address space, the situation will not be as good, but this is a calculated risk that we can live without the same amount of concurrency there. Regards, Brian