BLOG ON CAMLCITY.ORG: Ocamlnet

Ocaml and multicore programming - by Gerd Stolpmann, 2011-04-12

Netmulticore is my attempt at solving the multicore puzzle for Ocaml. It has reached now a development stage so that I can run test programs, and I see real speedups. Although not everything is perfect yet, the API has stabilized a bit, and it is close to being ready for broader testing. Expect a test release in the next days - I hope to finish it before the Ocaml Meeting at Friday, so you guys have something to talk about. (Unfortunately, I cannot visit.)

The approach of Netmulticore is unusual, but AFAIK the only one that works without modifying the Ocaml runtime and/or compiler. Instead of using kernel threads and implicitly sharing memory, Netmulticore forks full-fledged processes as thread replacements, and allocates explicit shared memory that is accessible by all worker processes. By doing this allocation directly at program startup, it is ensured that all processes see the shared memory block at the same address (which would not be the case if the processes mapped the block individually). The shared block is now treated in a very special way - shared memory has to cope with some difficulties that normally do not exist. All in all, this creates a setup where each process has its normal Ocaml heap (which is not shared with other processes) plus access to the shared block. This is not a bad situation provided we can make the access to the shared block as convenient as possible, and this is what Netmulticore is really about. If we can achieve a nice programming API for dealing with shared data, the multicore issue is solved, and probably even in a more scalable way than in other runtimes where all threads have to share a single heap, and constantly step on each others' feet.

Self-contained shared heaps

So what we want to have is that we can store normal Ocaml values into the shared block - including all regular data representations such as tuples, variants, records. Also, we want to allow modifications of values - immutable shared memory is not worth the fun. Netmulticore solves both issues - it provides direct read access to Ocaml values stored in the shared block, and it allows modifications (but the programmer has to follow a special API).

The initially allocated shared block is broken down in smaller units called shared heaps. The heaps do not have a fixed size but can grow and shrink (just like the regular Ocaml heap). The heaps are now the containers for the Ocaml values: When creating a heap, one can copy an initial Ocaml value into it, and by following the special rules for mutation it is possible to put more values into it (or remove values). The shared heaps are structured similarly to the normal Ocaml heap, and contain the value blocks densely packed one after the other. Free areas are managed with a free list. When the shared heap fills up, a specially implemented garbage collector tries to find unreachable values and reclaims the space (using the mark-and-sweep design). Shared heaps have a lock which synchronizes accesses to it, and, unfortunately, limits the degree of concurrency. Especially, only one process can write to a heap at a time. The programmer can, if necessary, work around this limitation by using several heaps - each heap has its own lock, so tricky application designs can avoid lock contention.

This sounds like a nice idea, but there are some traps. In the next two paragraphs I'll try to give an impression of the difficulties. The essence is, after all, that managing shared heaps requires a disciplined programmer, or the memory gets corrupted. This is the downside of the Netmulticore approach - it is quite easy to crash the program by not following one of the programming rules. (This is, however, also true for "normal" multi-threaded programming.)

The first difficulty is that this design requires that the shared heaps are self-contained. This means that no pointer must ever exist that points from a shared heap to the normal process-local Ocaml heap, or from a shared heap to a different shared heap. The first kind of pointer would cause invalid memory accesses if a second process dereferenced such a pointer. The second kind of pointer confuses the garbage collector. What is still allowed, of course, are pointers from process-local memory to the shared heaps. (The garbage collector built into the Ocaml runtime fortunately does not follow such pointers.) However, one should be careful: the garbage collector cleaning the shared heap from time to time will not see that such "external" pointers exist, and will not keep the referenced data alive. It is left to the programmer to do something about it.

The second difficulty is how to actually do mutation in shared heaps. If you have every written a memory manager, you'll probably know the problem: Each allocation can cause a GC run, and this can invalidate what you've just put into the heap but is not yet considered as reachable by the GC. The solution is to keep a set of further roots, i.e. pointers the GC must also consider although they are not in the memory region the GC manages. I omit here the details - the point is that mutation requires a special procedure so that such additional roots can be managed. This is a bit like declaring arguments of wrapper functions with the CAMLparam macros. A similar convention exists for Netmulticore, only that it has to be done on the Ocaml level.

Higher-level data structures

Netmcore_array: Keeps data in an array, and provides synchronization for accessing array elements

Netmcore_matrix: a two-dimensional array

Netmcore_buffer: A shared string buffer where one can add strings at the end and remove data from the beginning

Netmcore_queue: A shared queue very much like the Queue module of the standard library, but again with additional synchronization

Netmcore_hashtbl: A shared hash table very much like the Hashtbl module of the standard library, but again with additional synchronization

Netmcore_ref: A single shared variable (like a shared "ref" variable)

Fortunately, the programmer does not need to remember all this low-level stuff most of the time, because there are a number of ready-to-use data containers. These are already developed on top of the raw shared heap structure, and are a lot safer to use:

In some sense, these modules are "ports" of the corresponding data structures that are provided by the standard library. During porting I had especially to change the way the data is mutated so the mentioned programming rules are followed. (This is the main reason why these ports exist - you cannot put a normal Hashtbl into shared memory, because the normal mutation breaks the rules for shared heaps. There is no such problem when read-only data structures are copied to shared heaps.)

Synchronization primitives

Netmcore_sem: Semaphores

Netmcore_mutex: Mutexes (including normal and recursive ones)

Netmcore_condition: Condition variables

Message passing

The code, please!

The mentioned data types have - to some degree - built-in protection against uncontrolled parallel access. Sometimes, however, it is useful to have additional ways of managing synchronization:The condition variables have a bit unpleasent API, because it is the task of the caller to allocate a special block of memory for each process that can be suspended. In system-level implementations of condition variables this block can be hidden from the user (it can be put into the thread control block). As we don't have access to something like a process-local but nevertheless shared place the only solution I'm seeing right now is to delegate this obligation to the caller. But anyway, the important message is that Netmulticore provides condition variables, and that it is thus easy to signal (or "broadcast") suspended processes.Netmulticore also integrates nicely with Camlbox, the message passing API that exists since Ocamlnet-3. Camlboxes allow it to send Ocaml values from a number of sender processes to a single receiver process. The implementation of Camlboxes also uses shared memory (and not sockets), and is very fast.What I've described in this article already works so far. Netmulticore is distributed as part of Ocamlnet, and is right now only available in the svn repository:

You'll probably have to check out the whole tree to build it.

The skeptical reader is very encouraged to look into the examples, just to see how nice the code using Netmulticore looks. It is the first time you can use shared memory from Ocaml without having to deal with data marshalling issues (because we don't use this technique). Actually, the code looks very much like multi-threaded programming, only that here and there different primitives need to be used.

What next?

It is very important to create sample programs that use a library like Netmulticore, because problems only show up in practice, and are hard to predict. There are already three non-trivial examples, and I've plans to write a few more. Expect also blog articles about this, and how the performance is (and the numbers I got so far are promising).

The Netmulticore implementation has reached some "beta quality", at least. We need a few improvements here and there, but generally the code exists and works.

It is generally a good idea to watch out how to make Netmulticore programming safer. As pointed out before, it is right now easy to crash the program (with a segfault) when missing one of the special programming rules. The OCaml compiler could perhaps help here and prevent some mistakes. What I've especially in mind here is a typing annotation whether a value is in a shared heap. This annotation would normally be invisible to the user (like the polarity annotation), but jump in at compile time when a reference to a non-shared value is stored into a shared value. However, this annotation would probably require a modification of the compiler.

If anybody is interested in testing Netmulticore, please write me. Optimizing programs for multicore is tricky, and getting more experience here would allow us to make a good step forward.