Currently, GHC uses a round-robin scheduler for Haskell threads, with some heuristics for when threads should be bumped to the front of the queue. This patch set replaces this scheduler with 'stride scheduling', as described by Waldspurger and Weihl '95, which is an efficient, deterministic method for scheduling processes with differing priorities. Priorities are assigned by giving 'tickets' to threads; a thread with twice as many tickets as another will run twice as often. I’d like to replace the round-robin scheduler completely with this scheduler.

Here are nofib benchmarks comparing the old scheduler to the new:

-------------------------------------------------------------------------------- Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- Min -0.0% -52.2% -18.8% -18.6% -83.1% Max +1.0% +4.2% +4.9% +5.1% +7.7% Geometric Mean +0.1% -2.8% -0.9% -0.9% -2.9%

Here are some technical details:

Without any changes to ticket values, scheduling behavior should be functionally identical to round-robin. (By default, new threads, including the IO thread, get allocated the max nubmer of tickets possible.) This is not quite identical, since our heap does not have FIFO property (see below.)

The current patch-set uses a very simple (e.g. undergrad level) resizable-array backed heap to implement the priority queue; we can play some tricks to reduce the memory footprint of the priority queue (e.g. using the container_of macro to eliminate the extra keys store); and a more fancy data structure would make it easier for us to surgically remove entries or reweight them, but I wanted to keep overhead low. If anyone has a pet implementation of priority queues in C feel free to swap it in. Right now, this only affects the uses of promoteInRunQueue() in Messages.c; I still need to check if #3838 (closed) has regressed.

We get three new primops: setTickets# , getTickets# and modifyTickets# . We don't support creating threads with specific numbers of tickets (mostly because that would have added an annoyingly large set of extra primops); instead, you're expected to spawn a thread which gets max-ticket allocation, and then weight it down.

, and . We don't support creating threads with specific numbers of tickets (mostly because that would have added an annoyingly large set of extra primops); instead, you're expected to spawn a thread which gets max-ticket allocation, and then weight it down. _link is no longer used for linking TSOs in the run queue. I have tried my best to stamp out any code which operated on this assumption, but I may have missed some.

is no longer used for linking TSOs in the run queue. I have tried my best to stamp out any code which operated on this assumption, but I may have missed some. Modifying a TSO's tickets takes out the scheduler lock; the hope is that this operation is quick and rare enough that a global lock here is not too bad.

We are unfortunately stuck with some magic constants w.r.t. ticket values: 1 << 20 is the maximum number of tickets our implementation is hard-coded to support.

Sleeping and blocked tasks do not get any 'bonus' for being blocked.

In an ideal world, when a thread hits a black hole, it should temporarily give its tickets to the thread evaluating the black hole, so it will unblock more quickly. Unfortunately, implementing this is pretty complicated (the blackhole evaluating thread could die; or it could get stuck on a blackhole itself and need to gift its tickets; it shouldn't be able to give away the tickets it was gifted.) So this implementation leaves that out. Similar semantics for MVars are probably possible, but will require userland assistance too.

I haven't prepared a patch to base yet with a user-level API, but here is a proposed draft: