OcamlExc

An uncaught exceptions analyzer for Objective Caml

This software was developed by François PESSAUX during his PhD thesis under Xavier LEROY's direction. This project was partially supported by France Telecom R&D (CNET). Special thanks go to Pierre WEIS who also contributed in this work with his collaboration for researching with François on the polymorphic recursion which allows to enhance the results of the analysis.

Ocamlexc is a framework designed to statically analyze Objective Caml programs in order to discover potentially raised and uncaught exceptions. Such exceptions are harmful because they cause the immediate end of the running program. This tool can hence be seen as a static debugging tools, adding more safety to the already powerful static and strong typing system that makes ML programs so robust.

Basically, the analyzer needs two phases. First ocamlexcc is applied to each of the source files of the program one wants to inspect. Because OcamlExc allows incremental and modular analysis (hence avoiding to re-analyze the whole files of the program if just some of them changed since the last analysis), one needs to schedule the analysis of those files in their dependence order. This means that if a file foo.ml uses a file bar.ml, then one must first analyze bar.ml before analyzing foo.ml.

Once each of the files of the program has been analyzed, one then need to invoke a kind of linking pass, with ocamlexcl (or better, ocamlexcli which provides a graphical user interface to navigate in the results of the analysis, instead of a simple and raw text output as just provided by ocamlexcl). Then it is possible to inspect the result of the analyzed program for each file it contains.

As mentioned above, a simple raw text output is provided, but users may prefer to use the graphical user interface available via the OcamlTk library. This interface will be described deeper later, but let's say it allows to display the results in a much more pretty way, allowing to roll / unroll information to keep results more readable. It's also possible to load both the interface and the source of each module. Several windows can be opened simultaneously to have an overall view of the analyzed program. Markers can be set in the windows to identify locations where the user wants to go quickly. And then a syntax-based colorization is available to make sources more pleasant to read.

What are the results of an analysis ? Indeed they are types. We will describe them more accurately later, but as an introduction, let's say that they are just like regular ML types (as int , -> , * , and so on), but each of them are now annoted with the set of values they can contain. And each functional type, we mean arrow type, is annoted with the set of exceptions that can be raised and not handled during the use (during the application) of this function. So, for each expression of the program, the user will get its annoted type and the set of exceptions that can be raised during the evaluation of this expression.

Of course, this OcamlExc is a tool, based on a static analysis theory. So it does no black magic. The results the user will get are not necessarily the exact set of exceptions that will be raised during an effective run of the analyzed program. As every static tool, the results are a conservative approximation of the reality. This means that if OcamlExc tells the user that an exception Foo cannot be raised, then the user can be sure that this exception will not be raised and will not cause the interruption of his program one day. But sometimes, Ocamlexc will tell that an exception Bar can be raised, but indeed, due to a very strong and subtil invariant in the program, or due to a particular input given to the program, this exception won't be raised during a particular effective run of the program. We don't do miracles, just analyses ! But at least, this means that the results provided by the analyzer are correct, it won't forget to signal a potential cause of abnormal interruption of the user's program !

OcamlExc is freely available and distributed under the form a a tar-gziped file containing the full sources of the analyzer and the ready to use annoted standard library interfaces. OcamlExc has been developed under Linux and will run under most of the Unix systems. No Windows version is yet planned. Because the analyzer uses some Unix features it is possible to have problems when trying to compile it under Windows. Indeed this has no been yet tried.

To install the basic analyzer you will need the Objective Caml development environment also freely available. If you also wish to install the graphic interface stuff (which is strongly recommended), you will also need to install the OcamlTk library. This library is also freely available here.

So first, install the Objective Caml compiler by following directives provided with the compiler package.

Then, if you need, install the OCamlTk library, also following the directives provided within its distribution.

Now you are on installing the analyzer itself. You then just have to follow the directives provided in the archive. Basically this is summarized by " make ", " make install ". That's it !

The exception analysis is split in two passes. First, each source file of the program must be pre-analyzed in a separate way, using the ocamlexcc command. This generates ".cme" files ("foo.ml" will be analyzed and a file called "foo.cme" will be generated) that contain the partial result of the uncaught exceptions analysis. The complete analysis results cannot yet be known because they can depend on the results of the analysis for the others files. However, once a file has been analyzed, so long it does not change there is no need to re-analyze it. Once all the source files of the program are analyzed, it is possible to get the final result of the analysis by applying the "linking" pass. This is achieved by using either the ocamlexcl or ocamlexcli command. These two former commands perform basically the same thing : merging all the partial results coming from the ".cme" files previously generated by ocamlexcc and displaying the result of the analysis. The first command displays this result as text on the standard output. The second one provides the user a graphical browser, menu-window-mouse-driven. This second way to look at the result of the analysis is much more convenient. It provides several features to help reading types of the program, by allowing folding / unfolding of these types (which can be really big).

As we mentioned previously, the results of the analysis are displayed under the form of types. For each file analyzed, the annotated types of the definitions it contains will be displayed. This can be thought as if you were applying the command ocamlc -i (to generate an Ocaml source file interface automatically) on each of these files, the difference being that types are now annotated by extra information. Each type constructor is now labelled by the set of its possible values. For example an expression having the type unit[():Pre; `a] stands for an expression of type unit and that can evaluate only into the value (). Of course, this case is trivial because () is the only value of type unit.

Each value contained in the set annotating a type constructor is labelled by a presence flag Pre or by a presence variable (for example "$a"). The tag Pre means that the value is really present in this set. A presence variable is used to reflect the sharing of presence between several parts of a type expression. If such a variable only appears in positive occurrence and is universally quantified, then it can be considered as the "empty presence" (so, like the "absence").

The "'a" appearing at the end of the list containing the values can be forgotten. It's indeed a kind of "hook" used to add possible values in which the expression could have been evaluated. Because this "hook" (row variable) is here not constrained, this means that there is no more possible values, and then the set of possible values only contains what precedes this "hook". In the idea as presence variables appearing only in positive occurrences and universally quantified can be considered as a "void presence", row variables appearing only in positive occurrences and universally quantified can be considered as an empty set of value. Hence they can be ignored.

Some other examples of types, with their interpretation follow. For a more complete discussion about the framework and the internals of the analysis, please read the research article " Type-based analysis of uncaught exceptions" by Francois Pessaux and Xavier Leroy published in POPL99. Users interested in a deeper understanding of the whole theory underlying OcamlExc can also read my PhD thesis report (written in French), in which a full description and proves establishing the soundness of the system are provided. This document is bigger than the previous research article, contains more information and may be more difficult to read, but contains the more advanced information about all this work. Nevertheless, I tried to write it in the more pedagogic way I could to allow people without any important background in type theory and program analysis to be able to understand it.

So, further explanations about the deep meaning of each component of a type are outside the scope of these introduction pages. Once again, it is recommended to use the ocamlexcli command instead of ocamlexcl to get a more understandable output of the results of the analysis.

int[1:Pre; 2:Pre; `a] stands for the type of integer values that can only evaluate into 1 or 2 .

stands for the type of integer values that can only evaluate into or . Let's consider the identity function: let f x = x It will have type f : 'a -{`a}-> 'a The row between braces in the arrow of the type represents the set of exceptions that can possibly be raised during an application of this function.

It will have type The row between braces in the arrow of the type represents the set of exceptions that can possibly be raised during an application of this function. For example, the following function: let f () = raise Not_found has type: f : unit[():Pre; `a] -{Not_found:Pre; `b}-> 'a showing that the exception Not_found can be raised when f is applied.

has type: showing that the exception can be raised when is applied. The type of the raise function, used to raise the exception given to it as argument will hence be: raise : exn[`a] -{`a}-> 'a showing that the set of exceptions provided as argument will be the same as the one that can be raised during an application.

function, used to raise the exception given to it as argument will hence be: showing that the set of exceptions provided as argument will be the same as the one that can be raised during an application. When it turns out to be impossible to compute accurately the set of values contained in a basic type (that's to say for types int , string , char , float ) the "everything possible" annotation will be used and is represented by _ (the underscore character). Hence, a function returning a random integer will have type: random : unit[():Pre; `a] -{'b}-> int[_ ]

, , , ) the "everything possible" annotation will be used and is represented by (the underscore character). Hence, a function returning a random integer will have type: ] For sum types (enums), and more generally, for types whose signature is finite, the _ annotation is never used. Instead, when the set of value cannot be accurately computed, an row enumerating all the possible constructors is returned. For example, the type for boolean expressions has a finite signature (that is, the number of values of this type is finite). Indeed a boolean value is either true either false So the definition of the bool type can be seen as: type bool = true | false To represent the "everything possible" annotation, the following type will be used: bool[false:Pre; true:Pre; `a] instead of something of the form: bool[_]

annotation is never used. Instead, when the set of value cannot be accurately computed, an row enumerating all the possible constructors is returned. For example, the type for boolean expressions has a finite signature (that is, the number of values of this type is finite). Indeed a boolean value is either either So the definition of the type can be seen as: To represent the "everything possible" annotation, the following type will be used: instead of something of the form: When a constructor has an argument, no presence annotation labels it. Instead, in the row the values will have an argument that is the annotated type representing the set of values that can possibly be contained in the argument. This representation goes on recursively, depending on the structure of the type definition. If the set of values represented by the type of an argument's constructor is "empty", then this constructor (this value) must be considered in fact as absent. The idea is that something embedding something empty is itself empty. Let's consider the following type definition: type t = A of int | B The expression (A 5) will have type: t[A (int[5:Pre; `a]); `b] where it is possible to see that the argument of the constructor A is an integer value that can only be 5. The type t[A (int[_]); B:Pre; `a] stands for the type of an expression possibly evaluated into whatever value of type t. And then, t[A (int['a]); `b] is the type of an "empty" value of type t. Indeed, the value A has an argument that is an empty set of integers, so this whole value of type t is "empty".

'a for type variables

for type variables `a for row variables

for row variables $a for presence variables

When launching ocamlexcli with proper ".cme" files names, a requester is opened displaying all the loaded files. Double-click on one of them will open 2 new windows. The first one displays the signature of the opened module. The second window displays the source of the related module.

This window displays the list of selected ".cme" files that are parts of the analyzed program. By double clicking on their name, you can pop up the "interface" and "source" windows related to these files. The selection window also contains a "Misc" menu

on the right upper part providing a small preferences editor. This editor allows the user to configure the color schemes for the display. Preferences can be either saved and applied either just applied to the current session. When saved, a configuration file called ".ocamlexcrc" is created in the user's home directory.

In this window, each component of the module is given an annotated type that reflect the possible values of this type. At the origin, types are rolled so that they look like regular ML types. When a type is underlined, this means it has an anchor on which is it possible to click. Clicking on such an anchor will roll / unroll (toggle) the type, showing (hiding) values that can be present in this

type. In the same idea, function arrows can be underlined meaning that the function can raise exceptions when applied. To inspect this set of exceptions, just click on the related anchor. So, when a function type does not have any anchor, this means that it cannot raise any exception.

Because some parts of type expressions can be shared between several type expressions, rolling / unrolling a type allows to see this sharing by rolling / enrolling it everywhere it appears. So don't be surprised if clicking on an anchor affects another part of the type expression you are inspecting.

This window displays the source of the opened module as it was formatted by the user. No more pretty printing is attempted. The user can select a part of the source by left-clicking and dragging the mouse over the text. If this portion of text corresponds to an expression of the language, then a window will pop up, displaying the annotated type and effect of this expression. Hence this allows to walk in the source of the program to understand where raised exceptions come from. These information must be interpreted regarding the fact that is it generated from the internal abstract tree of the program, and that the framework used by our analyzer is unification-based. For example, if inspecting a condition expression in which one branch really raises an exception and the other one does not, both of the 2 branches will be mentioned as raising the exception (unification makes bidirectional flow of information). To better understand this aspect of the analysis, it is strongly recommended to read the related research article " Type-based analysis of uncaught exceptions". Please note that this remark is mainly devoted to advanced users.

In the 2 available windows provided by the browser, it is possible to set marks in the text for quick navigation. To do this, just select a piece of text and then click on the Set Mark button in the upper part of the window. You can set as many marks you want, each of them being local to the window where they have been put. Then to navigate between marks, just select the one you want to jump to in the Jump Mark menu of the upper part of the window.

Each window contains also a Close button allowing to close the current window if it is not needed anymore.

In the same way, each window contains a Quit button to allow fast exiting of the browser. This closes all the windows and terminates the browser.

Last, the Search button allows the user to look for a particular string in the window. This feature can also be invocated by the

"Ctrl-S" shortcut. Search can be done forward or backward depending on the selected option (>> or << radio-button in the search window). Search can be also made case sensitive or case insensitive by checking the appropriate option (a=A or a <> A radiobutton in the search window). By default, search is forward, case insensitive. To reach the next occurrence of the searched string, just press the Enter key in the search window. Finally to end a search, press the Abort button or use the "Ctrl-G" shortcut.