Profiling the stack

In the last post, I described a flat allocation profiler for OCaml 4.02 bytecode interpreter. In this post, I’ll describe further developments which add support for call stack information and better location information. Lets dive straight to the usage:

Enabling stack profiling

Stack profiling is enabled by setting the environment variable CAML_PROFILE_STACK to the intended depth of stack. Suppose we would like to attribute any allocation to the current function, we would set CAML_PROFILE_STACK=1 . To do the same to the current function and its caller, we would set CAML_PROFILE_STACK=2 . CAML_PROFILE_STACK=<INFINITY> should give you stack profile all the way down to the first function.

Why should I care about the stack depth?

Because it affects the program performance. Enabling stack profiling walks the stack for every allocation. This has the potential to severely affect the program performance. Most often, with a flat profile, you’ve tracked the offending allocation to some function in the standard library such as:

File "bytes.ml" , line 59, characters 7-81: C_CALL1 caml_create_string File "src/bigstring.ml" , line 98, characters 20-37: C_CALL1 caml_create_string

And all you want is to find out the caller of that standard library function in your code. A stack depth of a small number should provide you this information. You might have to play around with the stack depth to identify what you are looking for.

Profiling N-queens

You can obtain and install the profiling enabled OCaml 4.02 here. Let us obtain the flat profile first.

$ wget http://caml.inria.fr/pub/old_caml_site/Examples/oc/basics/queens.ml $ ocamlc -o queens -g queens.ml $ CAML_PROFILE_ALLOC = queens.preprof ./queens Chess boards 's size ? 8 The 8 queens problem has 92 solutions. Do you want to see the solutions <n/y> ? n $ ./tools/allocprof queens.preprof queens.prof $ head -n10 queens.prof Total: 77,863 words Instr Words % of total Location ----- ----- ---------- -------- 2488 31440 40.38% file "list.ml", line 55, characters 32-39 27681 31440 40.38% file "queens.ml", line 61, characters 46-52 27775 5895 7.57% file "queens.ml", line 38, characters 2-113 27759 4112 5.28% file "queens.ml", line 40, characters 33-41 27687 3930 5.05% file "queens.ml", line 61, characters 14-59 2403 86 0.11% file "pervasives.ml", line 490, characters 8-63 5391 44 0.06% file "list.ml", line 20, characters 15-29

Observe that we now have the precise location information directly in the profile, whereas earlier one had to manually identify the source location using the instruction information. In this profile, we see that most allocations were in list.ml:55 , which is the List.map function. However, we cannot pin down the source of these allocations in queens.ml from this profile since the profile is flat. Let us now obtain the stack allocation profile, which will reveal the source of these allocations in queens.ml .

$ CAML_PROFILE_ALLOC = queens.preprof CAML_PROFILE_STACK = 10000 ./queens Chess boards 's size ? 8 The 8 queens problem has 92 solutions. Do you want to see the solutions <n/y> ? n $ ./tools/allocprof queens.preprof queens.prof --sort-stack $ head -n10 queens.prof Total: 77,863 words Instr Current Cur % Stack Stack % Location ----- ------- ----- ----- ------- -------- 27836 0 0.00% 76911 98.78% file "queens.ml", line 100, characters 33-42 27549 0 0.00% 76870 98.72% file "queens.ml", line 85, characters 17-36 27466 0 0.00% 76473 98.21% file "queens.ml", line 45, characters 18-31 27715 0 0.00% 65117 83.63% file "queens.ml", line 62, characters 4-22 27694 0 0.00% 62880 80.76% file "queens.ml", line 61, characters 31-59 2487 0 0.00% 55020 70.66% file "list.ml", line 55, characters 32-39 2483 0 0.00% 31440 40.38% file "list.ml", line 55, characters 20-23

I’ve chosen a stack depth of 10000 to obtain the complete stack profile of the program. The option --sort-stack to allocprof sorts the results based on the stack allocation profile. We can now clearly see the stack of functions that perform most allocations. The line

27836 0 0.00% 76911 98.78% file "queens.ml" , line 100, characters 33-42

says that 98.78% of all allocations were performed by the function at queens.ml:100 , characters 33-42, and its callees. This isn’t surprising since this function is the top-level main function! More interesting is the 98.21% of allocations on queens.ml:45 . This is the recursive call to the concmap function, which in turn invokes the List.map function on queens.ml:61 . We’ve now pinned down the source of the allocation in list.ml:55 to queens.ml:61 .

Caveats and conclusions

Unlike stack profiles of C programs, OCaml’s stack profile does not include all the functions in the call stack since many calls are in tail positions. Calls to functions at tail position will not have a frame on the stack, and hence will not be included in the profile.

Please do submit issues and bug-fixes. Pull-requests are welcome! Also, here is my trimmed down (yay \o/!), non-exhaustive wish list of features: