[rust-dev] Segmented stacks (was: IsRustSlimYet (IsRustFastYet v2))

On 7/4/13 12:58 PM, Daniel Micay wrote: > You can create many threads with fixed stacks, they just start off > using 4K instead of however much smaller our segmented stacks will be. > A scheduler will just be more expensive than a regular lightweight > task. > > The 15-100% performance hit from segmented stacks pushes Rust well out > of the systems language niche. I think it does have to change if Rust > plans on ever fitting in the niche that C, C++ and D do. I agree. The sole benefit of segmented stacks on modern OS's that lazily allocate pages is that, on 32-bit, you can avoid running out of address space with many tasks. This is counterbalanced by these disadvantages: 1. There is no way for the compiler or runtime to know ahead of time how much stack any given task will need, because this is based on dynamic control flow. 2. The consequence of overshooting (choosing a stack size that is too big) is that the benefit above is reduced. 3. The consequence of undershooting (choosing a stack size that is too small) is disastrous performance. In the limit, the performance degrades to something like what many Schemes and SML/NJ do, in that stack frames are malloc'd from the heap. Except that Scheme and SML/NJ have precise generational garbage collectors with bump allocators in the nursery, and we have C malloc(). Furthermore, stack segment allocation is the slow path in C malloc, because it's in a high storage class. So performance becomes abysmal in the slow path. Unlike systems like Erlang and Cilk, there is no way to relocate stack segments in Rust because of unmanaged interior pointers: Erlang could at least in theory correct its mistakes and keep stacks contiguous (although I don't know if the implementation does). So the best we can do is cache and hope for the best--but too much caching increases memory usage and decreases the benefits of stack segments! 4. The benefit above is significantly reduced when calling into C code, and all solutions to this either hurt the benefit more or significantly penalize the FFI. I think that segmented stacks just don't work. *Relocatable* stacks may work, but not in Rust. From what I have read, Walter Bright and Rob Pike agree. At this point I'd like to suggest just allowing the user to choose a stack size on a per-task basis, and failing if the stack size is exceeded. Basically `__morestack` would turn into `fail`. Brian has pointed out to me that, currently, running out of stack has to abort the whole process, because the DWARF unwinder doesn't consider `__morestack` a "may-throw" position and as a result the arguments would leak. There are a number of ways we could fix this, ranging from principled to hacky. But for now I think aborting on stack exhaustion wouldn't be the end of the world (although others may disagree). Patrick