OCaml and Emily In a Walnut

Custom Designed For C++ and Java Programmers

by Marc Stiegler

Introduction

So, perhaps you are a C++ programmer, and you have noticed that, although you personally have never ever introduced a buffer overflow security breach, that notorious individual named Someone Else -- a member of every programming team -- probably has. As a consequence, if your product becomes popular, it is going to be a disaster.

Or, perhaps you are a Java programmer, your software seems sluggish, you are worried that the complaints will reduce sales, and you're worried you really need that factor of 2 performance that C++ still gives you, despite all the optimizations and all the compilation strategies the Java community has devised.

Do not despair. There is another programming language that can reduce the amount of code you have to write, increase your productivity, give you performance comparable to that of C++, while protecting you not only from buffer overflow bugs, but also from other security breaches.

OCaml has been around longer than Java; it is a mature language, robust and reliable for use in significant projects. It uses inference-based static type checking. This means you rarely have to write type declarations, but you get all the correctness benefits that Java and C++ programmers expect from their compilers. It is memory safe, so no buffer overflows. And it is fast, as fast or faster than C++ for many applications.

Emily is a design rule verifier for OCaml that implements the Principle of Least Authority (POLA) at the individual object level. With Emily, you can prove that large swaths of code employ little or no authority, allowing you to know that those parts of the program cannot cause harm. This allows you to focus your security review resources on a much smaller part of the system, dramatically reducing the "attack surface" available for crackers and other sport enthusiasts. A longer discussion of these features of OCaml and Emily can be found in the HP Technical Report, "Emily: A High Performance Language for Breach-Resistant Server, Multimedia, and Rendering Applications".

Here we will run you quickly through the basics of using OCaml and Emily. There are many other books and documents you can find about OCaml. If you have a background in functional programming, you should probably read one of those, not this. Most literature on OCaml is written with an eye to hammering the Java or C++ programmer into functional programming shape before letting him loose with code. Here, however, we will focus on teaching the language for people who are already familiar with imperative programming, who like imperative programming, and who will find it quicker and easier to learn if taught in an imperative programming style.

However, even with this orientation, the reference manual shipped with the OCaml distribution (current release named ocaml-3.09-refman.pdf) is a crucial document. In particular, Chapter IV is the documentation for the libraries. We will use only a tiny part of the libraries here. Indeed we fiercely minimize the number of functions from the libraries that we call. Browsing the libraries is important to give you a sense of what you can do that we have left out of this document.

For a person from a C++ background, the syntax of OCaml has many quirks. Of course, C++ has even more quirks ... but if you've grown accustomed to the C++ quirks, they are invisible. OCaml quirks are the same way. Once you've become accustomed to them, they too will disappear into the background. Stick with it, the reward is substantial.

Hello World and Factorial

Once you've installed OCaml, bring up the interactive environment. Open a command shell and start "ocaml", which will give you a simple prompt. Type the helloWorld program:



# print_string "Hello World

";;

Hello World

- : unit = ()

#

The function print_string prints the string given to it as a parameter. The line "- unit = ()" is telling you the type of the value returned by the function, in this case the "unit", which is approximately the same as returning void. The double semi-colon denotes the end of a top-level expression in the interactive environment; the interactive environment will not evaluate a block of code until it encounters the double semi-colon.

Let's look at another classic mini-program, the factorial function. Since this program is written in a nearly functional style even in C++, we will show it both ways: as an imperative program and as a functional program:

# (* factorial function, imperative style *)

let fac n =

let result = ref 1 in

for i = 1 to n do

result := !result * i;

done

!result;;

val fac : int -> int = <fun>

# fac 6;;

- : int = 720

# # (* factorial function, functional style *)

let rec fac n =

if (n <= 1) then 1

else n * (fac (n-1));;

val fac : int -> int = <fun>

# fac 3;;

- : int = 6

#

Features to note:

The form (* ... *) denotes a comment. Comments can be nested (though syntax highlighters often fail to understand this, which can cause excruciating confusion if you've come to assume your highlighter is always correct).

We start the definition of a function or a variable with the keyword let . If the function or variable is at the top level, we end it with the double semi-colons. However, when nested, the full structure one uses is " let variablename = variablevalue in body". In simple definitions, we put the in at the end of the line, and do not indent the next line. For the most part, a C++ programmer can think of it as the end-of-definition symbol, close enough to being the semi-colon after an assignment so you can do some programming while you get more comfortable with the language.

. If the function or variable is at the top level, we end it with the double semi-colons. However, when nested, the full structure one uses is " variablename = variablevalue body". In simple definitions, we put the at the end of the line, and do not indent the next line. For the most part, a C++ programmer can think of it as the end-of-definition symbol, close enough to being the semi-colon after an assignment so you can do some programming while you get more comfortable with the language. Most "variables" in OCaml are in fact constants, i.e., they are invariant, making the term "variable" an odd name. If you say

let x = 3 in

let y = 4 in

x + y

the x and the y are constants that cannot be reassigned. To make a variable that can actually change its value, we must make it a ref . Here we have made the variable result a ref .

the x and the y are constants that cannot be reassigned. To make a variable that can actually change its value, we must make it a . Here we have made the variable result a . To dereference a ref , use the exclamation mark; to assign a new value to the ref, use the ":=" assignment operator. Hence the line

result := !result * i;

dereferences the result to get its contents, multiplies that content by i, and assigns the outcome of that multiplication to the result.

, use the exclamation mark; to assign a new value to the ref, use the ":=" assignment operator. Hence the line result := !result * i; dereferences the result to get its contents, multiplies that content by i, and assigns the outcome of that multiplication to the result. Use the semi-colon to terminate imperative statements. The earlier print_string, if embedded in a function, would be terminated with a semicolon, as is the reassignment statement and the for loop.

A function returns the result of evaluating the last expression in its body. There is no separate return statement. In the imperative version of fac, we dereference the result as the last line; in the functional version, the function returns whatever value the if expression computes.

expression computes. A recursive function needs to use the keyword rec to alert the compiler.

to alert the compiler. The type signature inferrred by OCaml to describe this function is worth examining more closely. The type signature is "int->int", which says, the fac function should be given one integer, and it will return one integer. Both versions have this same type signature, as they should.

If you were to do a benchmark on these two versions of factorial, you would find that the imperative version runs faster. To run as fast as the imperative version, the functional version would have to be rewritten to be tail-recursive, which is beyond the scope of this introduction. On the other hand, the function version of fac is only three lines long, while the imperative version is 6 lines. In this particular case, the functional version is not only shorter, it is quicker to write, easier to read, and simpler to maintain. It is also fast compared to Java. But in that inner tight loop where performance is everything, imperative is the answer.

The if expression can have several statements inside each clause, but if this is done, they must be enclosed in parentheses or a begin-end clause, i.e., in some circumstances you use parentheses (or begin...end) the way you use braces in C++:

# if true then (

print_endline "abc";

print_endline "def";

) else print_endline "woa";;

abc

def

- : unit = ()

Because the if expression returns a value, and because OCaml type checks that value, the then and else clauses must return the same result. In general, when an if clause is used imperatively, the returned value will be unit for both clauses. If, however, you use a function that returns a value as well as performing a side effect, you will get a type mismatch that will puzzle you endlessly. Put a unit object as the last value to be returned: "()", and the type mismatch should be cleared up. Or put the keyword ignore in front of that last function to cause a unit value to be returned.

Simple Pattern Matching: Switch-like statements and exception handling

OCaml uses the match-with expression for many kinds of pattern matching. You can think of the match-with as a switch statement on steroids. First, we look at the match expression doing a simple switch statement job:

# let x = 3 in

match x with

| 1 -> print_string "it is a 1

"

| 3 -> print_string "it is a 3

"

| _ -> print_string "it is something else";;

it is a 3

- : unit = ()

The underbar clause in the match is the catchall, comparable to the default: clause in a switch statement. If you do not have separate cases for all possible matches, you will get a syntax warning if you do not include the catchall.

The match expression, like the if expression, returns a value. In the example above, it returns a "unit" value since that is the value returned by the print_string. But it can return other values, as shown here:

# let isEven num = match num mod 2 with

| 0 -> true

| 1 -> false

| _ -> raise (Failure "unlikely outcome");;

val isEven : int -> bool = <fun>

# isEven 3;;

- : bool = false

Here we also show a simple throw of an exception: we raise the builtin exception Failure that takes a string argument. During the catch (which is performed in the with part of a try-with), we can extract the string:

# try

raise (Failure "abc")

with

| Failure description -> print_string (description ^ "

")

| _ -> print_string "surprise";;

abc

- : unit = ()

Yes, exceptions are handled with pattern matching as well. The passage "Failure description ->" matches exceptions of the type Failure, and loads the variable description with the contents of the exception.

As an aside, note the use of "^" for concatenating strings.

Match clauses, like the then and else clauses of an if, must all return the same type of value.

There is another subtle difference between the match and if expressions. If a match clause has multiple statements, the statements do not have to be grouped by either a set of parentheses or a begin-end block. This sounds attractive, but in fact can cause a problem when you nest matchers, or when you try to follow a match expression with something else that should be done no matter which match clause is evaluated. See the Gotchas section later for more details.

We will do much more with pattern matching later.

The Cat Program

Let's stretch the machinery we've learned and write a useful program. We will write the Linux bash command "cat". Cat copies the contents of stdin to stdout. Here is the program:

try

while true do

output_char stdout (input_char stdin)

done

with eof -> ()

In this example, there is no pound sign at the beginning, and there is no double-semi-colon at the end. That is because we are not running it in the interpreter, rather we are compiling it as a program. Put this source in a file "cat.ml" and compile with the command line "ocamlopt -o cat.exe cat.ml". You should be able to use the result just like the normal cat command (note: if running on Windows using the MSVS compiler, there are a number of *.lib files that will have to be included in the default compilation directory. If you figure out a better solution, please contact the author).

Note the form of the while-do-done construct. Also note that this with clause has a single pattern against which to match. This single pattern is simply a variable into which the exception will be loaded. We have called the caught exception "eof", suggesting end-of-file, but any exception would be caught and would be filled into this variable, which could be raised again if appropriate. In this particular example, no matter what the exception is, we are happy to just let the program proceed through the last expression in the source. When the last expression is completed, the program terminates automatically, i.e, no explicit "exit" is required.

Had we wanted to do something different if we caught a real eof exception, we would have matched for the builtin exception End_of_file. This exception does not have content like the Failure exception, so the match would not include a variable, only the exception name, i.e.,

with

| End_of_file ->

The cat program also introduces the builtin in_channel and out_channel, stdin and stdout. The builtin function output_char takes as arguments an out_channel and a char, returning unit. The function input_char takes an in_channel as an argument, and returns a char.

Arrays

OCaml has arrays, their types are implied by the type of the element:

# let numbers = [| 1; 2; 3 |];;

val numbers : int array = [|1; 2; 3|]

# numbers.(1);;

- : int = 2

# numbers.(1) <- 5;;

- : unit = ()

The "<-" operator is used to reassign the elements of the array. You can make an "empty" array like this:

# let charArray = Array.make 3 'a';;

val charArray : char array = [|'a'; 'a'; 'a'|]

This make function (in the Array module, which we'll discuss later) takes two arguments, a length and an initial value to load into each element of the array. So it is not really empty.

Strings are indexed in a similar matter:

# let title = "president";;

val title : string = "president"

# title.[1];;

- : char = 'r'

Strings use square brackets rather than parentheses to offset the index.

Strings are mutable, just like arrays. Resist the temptation to use this feature. You need to use this feature when reading from an in_channel, but do not use it elsewhere. Eventually the Emily design rule verifier will throw mutable strings back at you if you try to use them.



Records, and an Intro to Objects

There are several ways you can construct object-like components in OCaml. Functions can be passed around like objects and can be thought of as objects with a single method, as shown here.

# let observer() = print_string "observed event

";;

val observer : unit -> unit = <fun>

# let notify obs = obs();;

val notify : (unit -> 'a) -> 'a = <fun>

# notify observer;;

observed event

- : unit = ()

In this observer example, the notifier is handed the observer function. The notifier then invokes the function with the appropriate arguments. In this example, the observer function expects no parameters, i.e., it receives a unit argument, denoted by the "()".

Another way to make objects is to use Modules. Modules are best for maximum performance. They are explained late in this walnut.

Another way to make objects is to use classes, which give performance that is disappointing compared to C++ (though still excellent compared to Java).

But the simplest way to make things that act like objects may be to just use records.

Records must have explicit types declared:

# type point = {x:int; y:int};;

type point = { x : int; y : int; }

# let origin = {x=0;y=0};;

val origin : point = {x = 0; y = 0}

# origin.x;;

- : int = 0

We declare the types of the x and y fields of the record using a colon to separate the variable from the type. When we actually create a record that matches this declaration, we assign the fields values using the equal sign. We access each field using a dot notation.

We can easily build a constructor for record-style objects (assuming we have already defined the type point):

# let makePoint newX newY = {x = newX; y = newY};;

val makePoint : int -> int -> point = <fun>

# let point1_2 = makePoint 1 2;;

val point1_2 : point = {x = 1; y = 2}

# point1_2.x;;

- : int = 1

The makePoint function manufactures a new record, given x and y values. The last thing that makePoint does (which also happens to be the only thing it does) is create a record, so the record is returned as the result of evaluating makePoint.

Records can hold functions as well as data. Here is a fancier makePoint that allows an existing point to manufacture an offset point:

# type point = {x:int; y:int; offset: int -> int -> point};;

type point = { x : int; y : int; offset : int -> int -> point; }

# let rec makePoint newX newY =

let makeOffset offX offY = makePoint (offX + newX) (offY + newY) in

{x = newX; y = newY; offset = makeOffset};;

val makePoint : int -> int -> point = <fun>

# let point1_3 = makePoint 1 3;;

val point1_3 : point = {x = 1; y = 3; offset = <fun>}

# let point2_4 = point1_3.offset 1 1;;

val point2_4 : point = {x = 2; y = 4; offset = <fun>}

This example illustrates several points:

Note how, in this example, makePoint manufactures several different things: first it manufactures a makeOffset function just for this point, then it manufactures and returns the point's record (which uses the makeOffset function in its definition).

The makeOffset function declared inside the makePoint function has enduring access to the arguments passed in to makePoint, namely newX and newY. The arguments passed in are not transient ghosts as they are in C++ or Java. So you do not have to declare instance variables, and copy the arguments into the instance variables, in order to keep the argument value around. Instead, you can just use the arguments; they remain in the scope of the elements that use them.



Let us quickly assess how these arguments appear to the garbage collector: as long as the point manufactured by makePoint exists, the function makeOffset must exist. As long as the function makeOffset exists, the newX and newY arguments must exist. When the point is garbage collected, the makeOffset function may be garbage collected (unless the function is independently referenced somewhere). When the makeOffset function is garbage collected, the newX and newY arguments may be garbage collected (unless they are referenced somewhere else, impossible in this case since integers are passed by copy) .

Let us quickly assess how these arguments appear to the garbage collector: as long as the point manufactured by makePoint exists, the function makeOffset must exist. As long as the function makeOffset exists, the newX and newY arguments must exist. When the point is garbage collected, the makeOffset function may be garbage collected (unless the function is independently referenced somewhere). When the makeOffset function is garbage collected, the newX and newY arguments may be garbage collected (unless they are referenced somewhere else, impossible in this case since integers are passed by copy) . The type signature of makeOffset is this: given two integers, return a point. This is represented as "int -> int -> point". We will discuss type declarations more in the next section.

The makePoint function is recursive, because the makeOffset function inside it calls makePoint itself.

This version of the point type is also recursive: inside the declaration of the point type, we see a reference to the point type. We do not need a special syntax (like rec) to enable a type to recursively reference itself.

This makePoint function, which manufactures other functions that are placed inside of records, will look strange, even disturbing, to the C++ or Java programmer. Java actually has a mechanism quite similar to this, the inner class. However, the syntax is sufficiently different that the similarities may be hard to recognize. If this form is too alien for comfort in constructing objects, use the class keyword introduced later, or use Modules introduced much later, to make objects.

Finally, records can contain mutable fields, i.e., true variables that can be reassigned:

# type personalData = {name : string; mutable phoneNumber : int};;

type personalData = { name : string; mutable phoneNumber : int; }

# let john = {name = "John"; phoneNumber = 1234567};;

val john : personalData = {name = "John"; phoneNumber = 1234567}

# john.phoneNumber <- 7654321;;

- : unit = ()

# john.phoneNumber;;

- : int = 7654321



To reassign a new value to a mutable element of a record, use the "<-" operator.

It may seem irritating that there are two operators for reassignment, i.e., mutation. Array elements and records use the "<-" operator, and the ref variable uses the ":=" operator. In fact, the ref is really a syntactic shortcut for a record with a single mutable element "contents":

# let z = ref 0;;

val z : int ref = {contents = 0}

# z.contents <- 1;;

- : unit = ()

# z.contents;;

- : int = 1

# !z;;

- : int = 1

So we see that "<-" is the primitive reassignment operator, and ":=" is a syntactic shortcut.

Classes

If you come from C++ or Java, it may seem inconceivable to do object-oriented programming without classes. However, as mentioned earlier, there are other alternatives in OCaml. Usually the other alternatives will be more appropriate. From a C++ or Java background, it only makes sense to learn OCaml and Emily if you need hot performance and/or superior breach-resistant security. If you need performance, you should use Modules. If you need more exotic forms of secure cooperation inside the program, you should use record-making functions like the makePoint shown earlier. Such records enable intermediation, which is key to more sophisticated security policy enforcement (see A PictureBook of Secure Cooperation for more on this topic). However, simple classes for simple objects are easy to learn and use:

# class point (x:int) (y:int) = object

method getX() = x

method getY() = y

end;;

class point :

int ->

int -> object method getX : unit -> int method getY : unit -> int end

# let origin = new point 0 0;;

val origin : point = <obj>

# origin#getX();;

- : int = 0

The message name separator is a sharp, not a dot. We had to specify types for the constructor arguments to prevent an ambiguity in type inference that the compiler cannot resolve.

The traditional features of class-based object-orient programming, such as private instance variables, inheritance, etc., are available. But they are beyond the scope of this introduction.

More Type Declarations

We have already seen type declarations for strings and ints, and for a couple of simple functions. A few of the builtins are:

int, int32, int64, nativeint

float, float32, float64

string

char

bool

All these primitive types can be included in the type declarations for functions. Here is the signature for a fancy function that receives two arguments, a bool and a simple function that receives nothing (unit) and returns an integer. Upon receiving the bool and the simple function, the fancy function returns a char:

type fancyFunc = bool -> (unit -> int) -> char

You can declare types that have to-be-determined elements. These are known as parameterized types. You denote an unspecified parameter with a tick in front of the element name. Here we create both a parameterized type, and a function that processes a parameterized array (i.e., an array for which we do not know the type of the contents):

# type 'a lengthCounter = 'a array -> int;;

type 'a lengthCounter = 'a array -> int

# let counter a = Array.length a;;

val counter : 'a array -> int = <fun>

# counter [| 1; 2 |];;

- : int = 2

This counter function is a function that generically processes any array, regardless of what kind of contents the array may have. Using parameterized types in this fashion is similar to using templates in C++.

An important kind of type is the variant type . The variant type allows you to pass any of several different kinds of data in a single argument:

type nameOrPhone = Name of string | Phone of int | Neither

The names of the variants must be capitalized. You can have variants that are simple tokens ("Neither"), and variants that contain data or other types. Using variants effectively depends upon pattern matching, discussed next.



Pattern Matching with Types

We can match on the cases in a variant type and perform different operations depending on which one we have received:

# type nameOrPhone = Name of string | Phone of int | Neither;;

type nameOrPhone = Name of string | Phone of int | Neither

# let printNameOrPhone a = match a with

| Name name -> print_string (name ^ "

")

| Phone num -> print_int num;

print_string "

"

| Neither -> print_string "Neither name nor phone

";;

val printNameOrPhone : nameOrPhone -> unit = <fun>

# printNameOrPhone (Name "Joe");;

Joe

- : unit = ()

This print function takes any object of the type nameOrPhone and prints it. Sometimes it will receive a string, sometimes a number, sometimes neither. It handles each one appropriately. Note, however, that we must do some explicit "typing": when we invoke the function, we cannot just send it a string or a number, we must send it a variant, i.e., (Name "abc") or (Phone 123).

Printf

OCaml has a bagful of print statements: print_string, print_int, etc. It can be quite tiresome to build a complex output stream. For those who prefer C tradition, there is a printf statement:

# Printf.printf "printing string: %s

" "using-printf";;

printing string: using-printf

- : unit = ()

Printf is part of the Printf module, which has a full complement of printf-style functions.

Gotchas

We have covered enough to do some interesting programming. We should probably take some time to cover some gotchas before proceeding further, in case you want to go off and build something sooner rather than later. It would be a shame to come this far, feel so confident you understand what OCaml is about, and then give up in a fit of swearing because of the compiler's senseless error messages.

As mentioned in the Introduction, OCaml has its fair share of peculiarities. These peculiarities seem particular peculiar coming from a C++ tradition. Here we list some of the surprises that left the author baffled and frustrated, in hopes that you will be able to learn from his experience.

No overloading

OCaml has polymorphism, but it does not have overloading. Coming from a C++ tradition, one has the concept of overloading so ingrained that this is hard to think about coherently.

The consequence first leaps out at you when using floating point numbers. The integer operators, plus, minus, division, and multiplication, cannot be used for floats. Instead, each integer operator is postfixed with a dot to form the floating point operator:

# let z = 3.1 +. 2.0 *. 2.7;;

val z : float = 8.5

OCaml will not cast ints to floats for you, either. Fortunately, there is a comprehensive collection of transforming functions for the builtin types, following the form of this example:

# float_of_int 3;;

- : float = 3.

Simple function names can still be used by discriminating among them using the module name:

# String.length "abc";;

- : int = 3

# Array.length [|1|];;

- : int = 1

In this example, strings and arrays both have a function named "length" associated with them, but the programmer must differentiate, or the compiler will be unable to disambiguate the functions and the types.

Currying

A powerful feature that can drive those of us from a C++ background crazy is implicit currying. If a function that needs two arguments is given only one argument, the compiler does not immediately throw a type checking failure. Rather, the result of this operation is a new function, that only needs one argument to produce the result of the original function:

# let adder x y = x + y;;

val adder : int -> int -> int = <fun>

# let add2 = adder 2;;

val add2 : int -> int = <fun>

# add2 3;;

- : int = 5

We manufactured the function add2 by feeding the original adder function only one of its two needed arguments.

This can be cool, but it means that, if you feed a function too few arguments by accident, you can be surprised, thousands of lines of code later, when your program tries to use the result and throws a type mismatch error:

# let resultInt = adder 8;;

val resultInt : int -> int = <fun>

# let diff = 7 - resultInt;;

Characters 15-24:

let diff = 7 - resultInt;;

^^^^^^^^^

This expression has type int -> int but is here used with type int

ResultInt is the result of accidentally forgeting the second argument to adder. The compiler doesn't immediately indicate a problem; as far as the compiler knows, it is just fine for resultInt to be a function that, given an int, will return an int. But later in the program, when we try to subtract resultInt, the compiler complains with a type mismatch.

The clue that alerts you that you have suffered a currying catastrophe is the shape of the type mismatch. If you experienced accidental currying, the signature on the mismatch will be a function that returns an object of the correct type, but which needs a smaller number of arguments. With this clue, finding the problem is usually straightforward.

No Null

There is no general purpose null in OCaml. Many programmers from a C++ tradition are surprised by how minor a problem this is. Often, if a variable contains a null, there is a bug in the program; these types of bugs are not possible in OCaml.

In circumstances where a null makes sense, since one uses variant types for many things anyway, creating a "null" type as a part of the variant is straightforward. We showed this technique earlier when we created the nameOrPhone type, which had the third variant Neither.

OCaml does have a general-purpose builtin variant type with this signature:

type 'a option = Some of 'a | None

You can use this variant with types for which you don't have a "nothing there" variant or value:

# let add2 x = match x with

| Some intx -> intx + 2

| None -> 2;;

val add2 : int option -> int = <fun>

# add2 (Some 3);;

- : int = 5

So you can always have a null, or at least a None, if you really want it.

Nested match expressions, match expressions followed by additional statements

Earlier we referred to a problem with nesting match expressions. The nested match must be enclosed in parentheses, otherwise the compiler cannot tell where the inner match ends and the outer match starts again:

# let nestedMatch x c = match x with

| 1 -> (match c with

| 'a' -> print_string "1a

"

| _ -> print_string "1 followed by unknown

"

)

| _ -> print_string "not a 1";;

val nestedMatch : int -> char -> unit = <fun>

# nestedMatch 1 'b';;

1 followed by unknown

- : unit = ()

You also have to completely enclose the match, with all its clauses, if you wish to have something that happens after the match clause has completed, i.e., something that happens no matter which clause is selected and evaluated.

Ambiguous Types

Sometimes the compiler will be unable to infer the types of the objects. You can help OCaml by specifying the types of the objects explicitly, with the colon-type declaration that we saw briefly in the class point definition:

# let rec fac (n : int) =

if (n <= 1) then 1 else n * fac (n - 1);;

val fac : int -> int = <fun>

The explicit typing of the argument n was unnecesary here for the compiler. But it might be helpful to the human who is trying to read the code during maintenance.

Modules

Modules can be used for manufacturing objects. They are also the basic mechanism for building library packages. First we look at them as object manufacturing systems, then as library package systems.

We will implement point objects once again. First we will implement the signature for point objects:

# module type PointSig = sig

type p

val getX : p -> int

val getY : p -> int

val offset : p -> int -> int -> p

val make : int -> int -> p

end;;

module type PointSig =

sig

type p

val getX : p -> int

val getY : p -> int

val offset : p -> int -> int -> p

val make : int -> int -> p

end

This signature declares the types of objects and the functions associated with those types. Here we've defined the abstract type p, with no description of its contents, which will be our point objects. Had we declared the full description of the p type, its contents would have been visible to users of the module, which is sometimes desireable.

This is followed by declarations of the functions, of which the most complex is the offset function. As you can see from the declaration, given a point and a pair of ints, the offset function returns a new point.

Now we define the Point module, which is associated with the signature as part of its construction:

# module Point : PointSig = struct

type p = {x : int; y : int}

let getX point = point.x

let getY point = point.y

let make newX newY = {x = newX; y = newY}

let offset point offX offY =

make (point.x + offX) (point.y + offY)

end;;

module Point : PointSig

Inside the Point module, we fully define the point type and the functions. Note that these functions are top-level functions, and do not use the in keyword to end the definition.

Now we can use them.

# let origin = Point.make 0 0;;

val origin : Point.p = <abstr>

# let p11 = Point.offset origin 1 1;;

val p11 : Point.p = <abstr>

We prefix the function with the name of the module to disambiguate which make function we are using. There is an "open module" statement that will set the specified module as the default module, so that the module prefix is unnecessary, but this is almost always a bad idea.

To create a module library package, we use the file suffix to tell us whether we have a module or a module signature, and we associate a module with its signature by giving both files the same name. In this example, we would create a signature file Point.mli and a module file Point.ml. We would strip off the first and last lines (the module Point... line and the end line), since that information is delivered with the file name and file extension. We can precompile the module signature and definition as a library, or compile them together with the main program; see the ocamlopt documentation for more detail. The *.mli files are more like C++ header files than they are like Java interface files; if you think of them as header files you will be in good shape.

If you write a module definition without an associated signature, OCaml will simply create a signature for the module in which everything is public and nothing is abstract or hidden.

Objects created as modules yield high performance code. The style and layout of modules may feel a bit alien, but you can't beat the numbers.

Other types of objects can be declared and defined as parts of modules. You can put class definitions in modules, and the simple record-based objects described earlier can be put into modules as well.

Some people feel that you don't have real objects unless you have inheritance. You can do inheritance with modules and record-based objects, but it isn't pretty, and it isn't in the scope of this document. In the last few years the author has come to the conclusion that inheritance is usually a bad idea. One should use delegation instead. Unfortunately, delegation is also beyond the scope of this document. But you can read more about it (and about the techniques that one would apply to record-based objects to get inheritance) in E in a Walnut.

Lists and Tuples

Up to this point we have tried to focus on making OCaml comfortable and familiar to the C++ or Java programmer. Sometimes this has not been as easy or even as possible as one would like, because OCaml is a very different language. But the goal has, to this moment, always been maximum familiarity within the constraint of showing enough stuff so you can get real work done.

We have shown, hopefully, that it is possible to get work done in a C++, imperative fashion. We could stop discussing OCaml here, and go on to describe Emily. But it would be a shame to completely disregard a couple of basic pieces of machinery OCaml has for enabling high speed production of compact, readable code. Here we give just a taste of tuples and lists. All the other texts on OCaml give glorious, exhausting detail on these parts of OCaml, so we just barely introduce them here.

Lists can be thought of as immutable vectors. Any element can be indexed via the nth function in the List module:

# let lst = [1;2;3];;

val lst : int list = [1; 2; 3]

# List.nth lst 2;;

- : int = 3

We can create a new list that appends a new element to the top of an existing list with a double-colon:

# let list2 = 4 :: lst;;

val list2 : int list = [4; 1; 2; 3]

Note that the appended item goes to the head of the list, i.e., element number 0. And we can engage the many compact and powerful list processing functions offered by the List module. In this example we use the map function to create a list in which each element is created by multiplying an element of list2 by 3:

# let listTimes3 = List.map (fun n -> n * 3) list2;;

val listTimes3 : int list = [12; 3; 6; 9]

This example uses an anonymous function, which uses the fun keyword and the "->" operator to create the definition; we could have defined a named function beforehand and used that in the map function instead.

Perhaps most interesting, we can do pattern matching on the elements of the list:

# type intOrString = Int of int | Str of string;;

type intOrString = Int of int | Str of string

# let thingsList = [Int 1; Str "abc"; Int 2; Str "def"];;

val thingsList : intOrString list = [Int 1; Str "abc"; Int 2; Str "def"]

# match thingsList with

| Str a :: Int x :: remainder -> ()

| Int y :: Str b :: remainder -> print_string "got int string at head

"

| _ -> ();;

got int string at head

- : unit = ()

You can quickly sort out what kinds of things you got in a list by using pattern matching. Here we zoomed right in on the lists that had an Int followed by a Str at the head of the list. The variable "remainder" contains the part of the string that was unmatched. Had we put the empty list "[]" instead of a variable name for the third element of the match, the pattern would have only matched lists that were exactly length 2, i.e., lists that had an Int and a Str and nothing else.

Tuples are not as malleable as lists, but they can contain objects of different types (unlike lists, in which all elements have to be of the same type -- the saving grace being, the type of the list elements can be a variant type). Here is a tuple with an int and a char, and a pattern that matches them:

# let joe = ("Joe", 9417758);;

val joe : string * int = ("Joe", 9417758)

# let (joeName, joePhone) = joe;;

val joeName : string = "Joe"

val joePhone : int = 9417758

Here we create the joe tuple that contains a string and an int. Note the type description of a tuple separates the types of the individual element with a star.

Having created the joe tuple, we then use a pattern match (a pattern match that doesn't even need the match construct) to pluck out the elements into separate variables.

Tuples can be quickly constructed (they need no type declaration like a record) and quickly taken apart into their individual components (with a pattern match). So the next time you find yourself wishing you could return a pair of values from a function or method, rather than inventing a new type just to carry the answer back, you can construct the tuple, and the recipient can disassemble it into its components in a flash.

Putting tuples and lists together in a tiny example,

# let database = [("Joe", 1234567); ("Amy", 7654321); ("Bob", 2976869)];;

val database : (string * int) list =

[("Joe", 1234567); ("Amy", 7654321); ("Bob", 2976869)]

# let retrieve name = List.find

(fun next ->

let (dbName, phone) = next in

name = dbName)

database;;

val retrieve : string -> string * int = <fun>

# let (name, phone) = retrieve "Amy";;

val name : string = "Amy"

val phone : int = 7654321

Here we have built a tiny database, built a tiny data retriever, and extracted a complete database record. Is it short, sweet, and clear? If it is, great, if it is not, please disregard.

Emily

Emily, as noted earlier, is a design style verifier for OCaml that enables Principle of Least Authority confinement down to the individual object level. With Emily, one can prove (in the engineering sense, not the mathematical sense) that many (indeed, most) of the objects in your program do not have enough authority to do any harm, thus reducing the attack surface for the cracker, and reducing the amount of code a security review team needs to inspect to ensure the system is robust against diverse types of attacks.

To use Emily, or to really convince yourself Emily is a good idea, you should start out by reading the HP Tech Report, Emily: A High Performance Language for Breach Resistant Server, MultiMedia, and Rendering Applications. Here we will focus on how to use the Emily tools, notably emilyopt, the verifier/compiler/linker. We will build a powerbox even simpler than the powerbox presented in the tech report, and an application that uses it.

Our powerbox will be a simplification of the "sash" powerbox in the tech report. Here we build minisash. Our minisash powerbox simply confers to the application the stdin and stdout channels. This tiny bit of authority is actually enough to run more than one application; here, we will build a cat command, a POLA-confined version of the cat program we wrote earlier that copies stdin to stdout.

Make a folder named minisash. Into this folder place the following files:

A file named "powerbox.ml", that has the following single line of code:

CapMain.start stdin stdout

This line of code simply invokes the start function in the CapMain module of the application, passing in the stdin and stdout authorities.

A file named "stdLib.txt", containing a single line with the path to the ocaml standard library folder.

A file named "safeStdLib.txt", containing a single line with the path to the safe, non-authority-bearing, version of the stdlib. This folder is a part of your Emily distribution.

A folder named "safePowerLib". This is the folder that contains interface declarations for objects and functions passed between the powerbox and the application. In minisash we only pass the standard in_channel and out_channel objects, so no interfaces are needed, and this folder remains empty..

A file names "safePowerLib.txt" that contains the path to the safePowerLib folder.

A file named "libs.txt" that contains nothing. This is the list of libraries, modules, etc., from both the powerbox and the stdlib that are needed for the link step. None are needed for this simple application.

Make a scond folder named "cat". Into this folder place the following files:

A file named "CapMain.ml" with the following code:

let start inchan outchan = try

while true do

output_char outchan (input_char inchan)

done

with eof -> ()

This is the main body of the cat program. Unsurprisingly, it looks much like the cat program earlier, wrapped up as a function in the CapMain module to be called by the powerbox. The special feature is what this version of cat cannot do -- it cannot do anything except read from inchan and write to outchan. It cannot open any files, connect to the network, or pop an annoying dialog. It can only do what you intended for it to do -- it is confined by the Principle of Least Authority, so that it has just enough power to carrying out your wishes, neither more nor less. You can hire a world-class cracker to write your cat program, and use his code feeling comfortable and secure (well, not quite, since Emily itself has not had a security review, but the design rules it enforces are theoretically able to supply such a feeling of comfortable security).

A file named "order.compile", with the single line CapMain.ml in it.

Now compile:

emilyopt minisash cat safecat.exe

This should create the executable file safecat.exe.

A Little Fancy Emily Footwork

Earlier a claim was made that Emily could support exotic security policies. We now try to show a simple example of a complex policy. Consider the following scenario: You hire a consultant to build a particular complex piece of code for your system. You are using Emily for development, but you are using Emily in a casual fashion, and are using the following form of file object:

type file = {getText : unit -> string; setText : string -> unit}



let makeFile path = {

getText = (fun ->

let chan = open_in path in

let length = in_channel_length chan in

let buf = String.make length ' ' in

input chan buf 0 length;

close_in chan;

buf);

setText = (fun text ->

let chan = open_out path in

Printf.fprintf chan "%s" text;

close_out chan)

}

The file type is public, i.e., it is included in the safePowerLib files with which the application is compiled. The makeFile function is private to the powerbox, since it uses full-authority calls to OCaml. So the application can use files that are handed to it, but cannot access random files at leisure just by guessing a path and using magic to open a channel.

Now the consultant has just finished up work on your stuff, and preliminary testing suggests it does what it is supposed to do. You call to tell the consultant that you are pleased with his work, but the FBI answers the phone and explains that the consultant has just been thrown in jail, he is the guy who put the back door in FrontPage Express back in 1999 when he worked at Microsoft, a back door that was not discovered for three years (yes, this part of the scenario is based on a true story).

The consultant gave you code that did everything you wanted. The nagging question is, is the code doing a few things on the side that maybe you wish it weren't? You are suddenly sensitive to the fact that the one file you hand to his subsystem is the key file for the entire system, and that sometimes it holds proprietary data.

Gulp. The code he has written is arcane and hard to read. The guy really is a wizard. Who knows what he's hidden there? Suddenly you have a dramatic trust realm boundary in the middle of your system.

You inform your team of the problem. Two of your programmers handle the file on its way to the consultant's subsystem. The first programmer knows that the consultant only needs read authority. So he creates an intermediary file object. It is a file object because it meets the type specification of a file, but this file object will throw an exception if you try to use the setText function:

let makeReadOnly file = {

getText = file.getText;

setText = fun text -> raise (Failure "Read Only")

}

You use makeReadOnly to manufacture a ReadOnly facet on the file, and send that on to the subsystem. Whew! Ok, now the consultant system at least can't corrupt your data.

It still makes you nervous, though, that he can read the file even at times when the file is undergoing proprietary transformations. Another programmer, who is receiving the readOnly facet and is going to hand it on to the consultant subsystem, runs the readOnly object through another intermediary, a toggleable intermediary:

type onOff = On | Off

let makeToggledFilePair file =

let accessAllowed = ref On in

let toggle setting = accessAllowed := setting in

let toggledFile = {

getText = (fun -> match accessAllowed with

| On -> file.getText()

| Off -> raise (Failure "Access off");

setText = (fun text -> match accessAllowed with

| On -> file.setText text

| Off -> raise (Failure "Access off")

} in

(toggledFile, toggle)

The function makeToggledFilePair, when given a file object, will produce a tuple with 2 parts: the first part is another kind of object that meets the file specification, and the second is a toggle for the file: invoke "toggle On", and the intermediating object lets requests go through. Invoke "toggle Off", and all access is denied. Now you can ensure that the consultant subsystem has only read authority, when and only when it is supposed to have read authority.

If you look at this example and wonder if the resulting code is not only more secure, but is also more likely to warn you early if there is a bug, the answer is, "Yes, of course." Lots of normal bugs will be blocked by implementing principle of least authority security. Security and reliability are intimately related. At least, they should be. If in your world they are not related, you are not using appropriate security techniques.

Conclusion With a Bite

You have now learned how to use software development tools for which strong security claims may be meaningfully analyzed. You are well on your way to writing new, fast software that will leave the world of crackers staring at your program in frustrated disbelief. Everyone knows, after all, that the crackers have the programmers on the run. We are outgunned, outclassed, outmatched, and no one can stop a cracker.

It is time to turn the tables. You now have the power to do so.