In this 6 part series, I’m going to introduce the internals of the OCaml programming language (tutorial and other references here). This isn’t going to be very comprehensive or in-depth. It’s more of a digest of the readily available information that I found by reading the manual, header files, and some of the compiler code. It’s definitely pitched at beginners, not experts.

By the end of it, you should be able to look at small pieces of OCaml and work out in your head how they get translated into machine code.

I’m definitely not a great expert on the internals of OCaml. This means a couple of things: (a) I’ve probably made some mistakes (post a comment if you see any). (b) I might “reveal too much” of the specifics of the current implementation: anything could change in a future version of OCaml, although the internals described here have been pretty stable for a long time.

Without further ado …

Part 1: Values

In a running OCaml program, a value is either an “integer-like thing” or a pointer.

What’s an “integer-like thing”? Any OCaml int or char , or some things which are stored like integers for speed, which include the constants true and false , the empty list [] , unit () , and some (but not all) variants.

For reasons that I’ll explain later, OCaml stores values which are integer-like and values which are pointers differently.

A value containing a pointer is just stored as the pointer. Because addresses are always word-aligned, the bottom 2 bits of every pointer are always 0 0 (the bottom 3 bits are 0 0 0 for a 64 bit machine). OCaml therefore stores integers by setting the bottom bit to 1 and shifting the integer over 1 place:

+----------------------------------------+---+---+ | pointer | 0 | 0 | +----------------------------------------+---+---+ +--------------------------------------------+---+ | integer (31 or 63 bits) | 1 | +--------------------------------------------+---+

You can quickly tell if a value is an integer-like thing (ie. is the bottom bit 1?).

OCaml gives you some C macros and OCaml functions you can use on values. The C macros are Is_long (is it an integer?) and Is_block (is it a pointer?) in the header file . The OCaml functions are is_int and is_block (for pointers) in the “infamous” Obj module.

# let what_is_it foo = let foo_repr = Obj.repr foo in if Obj.is_int foo_repr then "integer-like" else "pointer" ;; val what_is_it : 'a -> string = # what_is_it 42 ;; - : string = "integer-like" # what_is_it () ;; - : string = "integer-like" # what_is_it [1;2;3] ;; - : string = "pointer" # type bar = Bar | Baz of int ;; type bar = Bar | Baz of int # what_is_it Bar ;; - : string = "integer-like" # what_is_it (Baz 5) ;; - : string = "pointer"

Are ints slow? After all it seems like you have to do a lot of bit shifting to do any arithmetic with them.

It turns out that they are a little bit slower, but not by very much. The OCaml compiler can perform most arithmetic operations in between one and four machine instructions, and usually just one or two, making it the same or fractionally slower than a straight native integer. (Of course, there is a more serious penalty in the cases where you really need the full width of a 32 or 64 bit integer, for example in crypto or compression algorithms).

Increment addl $2, %eax Add a constant addl (constant*2), %eax Add two values lea -1(%eax, %ebx), %eax Subtract subl %ebx, %eax

incl %eax

That these are equivalent isn’t totally obvious, but this post explains why.

That concludes part 1. Tune in tomorrow for part 2, or if you’re reading this in the future, use this handy table of contents:

Update

This guide is discussed on reddit here. Also on Hacker News here.