Now that we have finally gotten the patent filings for this area of the Mill in, I can explain a bit about how Mill fork() actually works. There will be more and better explanation at millcomputing.com as fast as we can get it up.

The Mill is MAS-in-SAS. There is a global shared single address space, so if you mmap MAP_SHARED the allocation is in that global address space and all processes that reference the allocation use the same bit pattern in the pointer to reach it. Note that the bit pattern is the same, but the rights to use that bit pattern will typically differ between processes; permissions is a whole other topic.

In addition, each turf (and in process-oriented systems each process corresponds to a turf) has a local address space that is located non-contiguously in the global shared address space. Pointers have a bit that says whether they refer to the global space or to the local space of the current turf/process. If you mmap() with MAP_PRIVATE (or without MAP_SHARED) then you get a local pointer back.

The fork() operation creates a logical copy of the local space of the caller, located somewhere else in the global space. I say “logical” because fork uses copy-on-write to optimize the common case in which execve is promptly called before any memory is modified; this is routine for non-Mill systems too.

Local pointers convert to global pointers by XORing high bits with the ID of the turf, and global pointers convert to local pointers also by XOR with the turf ID; XOR is like that 🙂

After a fork(), all global pointers in the child refer to the same thing that they do in the parent, while all local pointers refer to the location in the child that corresponds to what the bitwise-identical pointer refers to in the parent.

Any unmodified cache lines of parent-local data can continue unchanged after the fork, but a child reference will get a cache miss (because the caches use global addresses and the child’s global address is different) and must load its copy of the data line; thereafter the two cache lines are independent and may be modified separately without confusion.

Any modified (dirty) parent-local lines in the cache must be written back to DRAM as part of the fork(), and the page tables used by the TLB set so that both parent and child use the same physical address for the data despite using different virtual addresses, and the page is marked copy-on-write. This is the usual COW lazy copy.

The result of all of this is that fork() on a Mill requires a cache flush of dirty lines from the parent local space that is not required on (some) MAS machines, but has no other added overheads. The gain is that sharing between processes has byte granularity, not page granularity, and is vastly cheaper to use.