Continuing the sequence of experiments I’ve been running with the Go language, I’ve just made available a tiny but useful new package: gommap. As one would imagine, this new package provides access to low-level memory mapping for files and devices, and it allowed exploring a few new edges of the language implementation. Note that, strictly speaking, some of the details ahead are really more about the implementation than the language itself.

There were basically two main routes to follow when implementing support for memory mapping in Go. The first one is usually the way higher-level languages handle it. In Python, for instance, this is the way one may use a memory mapped file:

>>> import mmap >>> file = open("/etc/passwd") >>> mm = mmap.mmap(file.fileno(), size, access=PROT_READ) >>> mm[0:4] 'root'

The way this was done has an advantage and a disadvantage which are perhaps non entirely obvious on a first look. The advantage is that the memory mapped area is truly hidden behind that interface, so any improper attempt to access a region which was already unmapped, for instance, may be blocked within the application with a nice error message which explains the issue. The disadvantage, though, is that this interface usually comes with a restriction that the way to use the memory region with normal libraries, is via copying of data. In the above example, for instance, the “root” string isn’t backed by the original mapped memory anymore, and is rather a copy of its contents (see PEP 3118 for a way to improve a bit this aspect with Python).

The other path, which can be done with Go, is to back a normal native array type with the allocated memory. This means that normal libraries don’t need to copy data out of the mapped memory, or to use a special memory saving interface, to deal with the memory mapped region. As a simple example, this would get the first line in the given file:

mmap, err := gommap.Map(file.Fd(), PROT_READ, MAP_PRIVATE) if err == nil { end := bytes.Index(mmap, []byte{'n'}) firstLine := mmap[:end] }

In the procedure above, mmap is defined as an alias to a native []byte array, so even though the standard bytes module was used, at no point was the data from the memory mapped region copied out or any auxiliary buffers allocated, so this is a very fast operation. To give an idea about this, let’s pretend for a moment that we want to increase a simple 8 bit counter in a file. This might be done with something as simple as:

mmap[13] += 1

This line of code would be compiled into something similar to the following assembly (amd64):

MOVQ mmap+-32(SP),BX CMPL 8(BX),$13 JHI ,68 CALL ,runtime.panicindex+0(SB) MOVQ (BX),BX INCB ,13(BX)

As you can see, this is just doing some fast index checking before incrementing the value directly in memory. Given that one of the important reasons why memory mapped files are used is to speed up access to disk files (sometimes large disk files), this advantage in performance is actually meaningful in this context.

Unfortunately, though, doing things this way also has an important disadvantage, at least right now. There’s no way at the moment to track references to the underlying memory, which was allocated by means not known to the Go runtime. This means that unmapping this memory is not a safe operation. The munmap system call will simply take the references away from the process, and any further attempt to touch those areas will crash the application.

To give you an idea about the background “magic” which is going on to achieve this support in Go, here is an interesting excerpt from the underlying mmap syscall as of this writing:

addr, _, errno := syscall.Syscall6(syscall.SYS_MMAP, (...)) (...) dh := (*reflect.SliceHeader)(unsafe.Pointer(&mmap)) dh.Data = addr dh.Len = int(length) dh.Cap = dh.Len

As you can see, this is taking apart the memory backing the slice value into its constituting structure, and altering it to point to the mapped memory, including information about the length mapped so that bound checking as observed in the assembly above will work correctly.

In case the garbage collector is at some point extended to track references to these foreign regions, it would be possible to implement some kind of UnmapOnGC() method which would only unmap the memory once the last reference is gone. For now, though, the advantages of being able to reference memory mapped regions directly, at least to me, surpass the danger of having improper slices of the given region being used after unmapping. Also, I expect that usage of this kind of functionality will generally be encapsulated within higher level libraries, so it shouldn’t be too hard to keep the constraint in mind while using it this way.

For those reasons, gommap was implemented with the latter approach. In case you need memory mapping support for Go, just move ahead and goinstall launchpad.net/gommap.

UPDATE (2010-12-02): The interface was updated so that mmap itself is an array, rather than mmap.Data, and this post was changed to reflect this.