Objective Caml Standard ML

Syntax

See this syntax comparison for more details.

Array/string shorthands

Special syntactic sugar is defined for array and string accesses. These operations receive no special treatment.

let arr = [| 1; 2; 3 |];; let two = arr.(1);; arr.(2) val arr = Array.fromList [1, 2, 3]; val two = Array.sub (arr, 1); Array.update (arr, 2, 6); val str = "Hello"; val e = String.sub (str, 1);

Arrays and strings are central data structures of "practical programming," so they should be as usable as we can make them. More syntactic sugar clutters the language definition. Arrays and strings show up infrequently in traditional functional programming applications, and many new ML programmers accustomed to array-based work could quite profitably switch to datatype-based solutions instead.

Character literals

Uses 'c' Uses #"c"

OCaml's syntax is shorter and follows the "standard" set by C. Apostrophes mean "type variable" or "prime" in SML and are parts of identifiers; they shouldn't be confused with character literals. Many symbolically-oriented SML programs don't manipulate individual characters, so we shouldn't complicate the lexer to support related features. (Consider 'a' , which could be either a type variable or a character literal, depending on where it appers.)

Identifier conventions

Module, module type, and constructor names must start in capital letters and other identifiers can't No capitalization convention

type myType = A | B;; let f = function A' -> 0 (* A' is signaled as an unbound constructor. *) | B -> 1;; datatype myType = a | b; val f = fn a' => 0 (* a' is processed as a variable. *) | b => 1; (* This case is signaled as redundant. *)

This convention stops a very nasty class of pattern matching bugs involving confusion between variables and variant constructors. It also eases the tasks of text editor syntax highlighters, making it easy to distinguish between module and variable names by color, for example. More flexibility can't hurt if you're careful, right? In actuality, most SML programmers with opinions would prefer the OCaml convention.

Let-binding syntax

Separate top-level let and let expressions Syntactic class of declarations and let..in..end construct for binding them

let six = 6 let rec fact x = if x = 0 then 1 else x * fact (x - 1) let six_fact = let six = 6 in let rec fact x = if x = 0 then 1 else x * fact (x - 1) in fact 6 val six = 6 fun fact x = if x = 0 then 1 else x * fact (x - 1) val six_fact = let val six = 6 fun fact x = if x = 0 then 1 else x * fact (x - 1) in fact 6 end

In practice, this approach leads to some very confusing error messages, since the compiler is less able to predict what grouping you really intended. Having a unified mechanism for top-level and local bindings leads to less duplication of functionality, and let..in..end seems empirically to lead to clearer error messages.

Overloaded "minus"

The standard - symbol is used for both negation and subtraction. Tilde (~) is used for negation.

1 - 2;; 1 + -2;; 1 - 2; 1 + ~2;

Lots of programmers would be confused by throwing over this long-standing convention. Differentiating subtraction and negation upholds the SML position that operators are just identifiers like any others that happen to be used via special syntax. Modulo the overloading of binary arithmetic operators, SML avoids situations where an identifier means different things in different contexts.

Semicolon precedence

Semicolon binds more tightly than match bodies and anonymous functions Semicolon binds less tightly than case bodies and anonymous functions

match x with 0 -> print_endline "It's zero!"; true | _ -> print_endline "It's not!"; false;; fun s -> print_string s; s;; begin match x with 0 -> print_endline "It's zero!" | _ -> print_endline "It's not!" end; print_endline "The End";; case x of 0 => (print "It's zero!

"; true) | _ => (print "It's not!

"; false); fn s => (print s; s); case x of 0 => print "It's zero!

" | _ => print "It's not!

"; print "The End

";

The OCaml precedence rules favor imperative code, expecting semicolons to be used as sequencers in many places. The SML precedence rules favor pure functional code, requiring parentheses around many places where semicolons might be used.

User-defined infix operators

Fixed precedence rules for infix operators; curried arguments User-defined precedence rules for infix operators; tupled arguments

let (++) x y = x + y + y;; 1 ++ 2;; List.map ((++) 1) [1; 2; 3] fun op++ (x, y) = x + y + y; infix 6 ++; 1 ++ 2; map (fn x => 1 ++ x) [1, 2, 3]

There is never any doubt on seeing an OCaml program about how to parse an expression that includes infix operators. However, sometimes the inflexibility of precedences forces use of "extra" parentheses. Curried operators make it easy to use partial applications as arguments to higher-order functions. User-specified precedences make it easier to implement nicer-looking embedded languages. However, infix declarations were never integrated well with the module system, meaning that client code can't import fixities from a library module.

Operators

Arithmetic operators

Different operators for arithmetic over different built-in numerical types Overloaded operators that handle several built-in numerical types

let four_int = 2 + 2 let four_float = 2.0 +. 2.0 val four_int = 2 + 2 val four_float = 2.0 + 2.0

Apparently following a "principled overloading or none at all" philosophy, OCaml breaks with a convention found in most other languages. SML overloads the arithmetic operators in an ad-hoc way. This has some unfortunate interactions with type inference, like inferring by default that a variable used in arithmetic has type int .

Equality

Equality works on any type but may raise run-time exceptions. Equality types characterize valid equality arguments.

let member (x : 'a) (ls : 'a list) : bool = List.exists ((=) x) ls;; member 2 [1; 2; 3];; member false [true; false; true];; member (fun () -> ()) [fun () -> ()];; (* Exception *) fun member (x : ''a) (ls : ''a list) : bool = List.exists (fn y => x = y) ls; member 2 [1, 2, 3]; member false [true, false, true]; member (fn () => ()) [fn () => ()]; (* Type error *)

It's very convenient to be able to compare values for equality without having to track whether or not they're comparable, and the lack of equality types simplifies the language. However, run-time type errors on bad equality comparisons are no fun. With SML, it's clear at compile time that no equality operation will fail with a run-time type error, and the types of functions that use equality clearly state that fact, removing a need for informal documentation. SML equality types are often criticized as a special case of type classes that ought to be replaced with that more general mechanism.

Standard libraries

Arrays vs. vectors

A single array type family Distinguishes between (imperative) arrays and (functional) vectors

Programmers used to C and its ilk don't want to have to worry about whether or not their arrays are functional. Isolating impure behavior as much as possible is very helpful at making programs easy to understand. A function taking a vector as input is guaranteed not to "change" it, and the function's type broadcasts that fact.

Currying vs. tupling

All standard library functions are curried. Higher-order functions are usually curried, with tupling the default elsewhere.

List.iter (output_string ch) ["Hello"; " there!"];; List.iter (fun (ch, s) -> output_string ch s) [(ch1, "A"); (ch2, "B")];; app (fn s => TextIO.output (ch, s)) ["Hello", " there!"]; app TextIO.output [(ch1, "A"), (ch2, "B")];

Currying makes it easy to pass partial applications as arguments to higher-order functions. Tupling makes it easy to treat entire function argument lists as first-class values.

Exceptions vs. option types

Most standard library functions indicate conditions like "end of file" by throwing exceptions. Most standard library functions indicate conditions like "end of file" by returning NONE.

Using exceptions provides uniformity with other uses of exceptions to signal more traditional "error" conditions. This choice also makes it easier to use multiple-return-status functions in situations where you know that they will encounter the common case. Giving multiple-return-status functions option return types makes their potential behavior clear, without requiring the programmer to consult informal documentation to learn that.

Generic functions

Standard library contains generic functions that couldn't be implemented in OCaml, like comparison and hashing. Equality is the only generic function of this kind, and it's built into the language.

1 < 2;; [1; 2; 3] < [4; 5; 6];; (fun () -> ()) < (fun () -> ());; (* Exception *) 1 < 2; [1, 2, 3] < [4, 5, 6]; (* Type error *) (fn () => ()) < (fn () => ());; (* Type error *)

It's very convenient to implement container structures and other functionality without having to worry about passing around comparison or hashing functions. Some of the generic operations, like comparison, can raise run-time type errors. SML avoids features like this whose formalization would require considerable extra work.

Mutability of strings

Strings are mutable. Strings are immutable (and in fact the string type is defined as a synonym for a vector type).

Mutating strings can be convenient. One often performs purely functional string manipulations, and it's useful for program understanding to have types that reflect that. If you want mutable "strings", use the CharArray module. By defining strings as vectors, we avoid including another primitive base type that goes unused in many programs, like OCaml does.

Data types

Algebraic datatype constructors

Second-class constructors that duplicate some tuple functionality First-class constructors

List.map (fun x -> Some x) [1; 2; 3];; type ('a, 'b) pair = Pair of 'a * 'b;; match e with Pair p -> p (* Type error *) map Some [1, 2, 3]; datatype ('a, 'b) pair = Pair of 'a * 'b; case e of Pair p => p

This scheme is easier to compile efficiently in the absence of dataflow analysis. However, by not treating variant constructors as functions, OCaml forces the use of wrappers as arguments to higher-order functions. By not treating multiple arguments to constructors as tuples, OCaml creates feature overlap between variants and tuples, making it hard to convert between them. SML avoids the two problems mentioned for OCaml. Using dataflow analysis, SML compilers like MLton compile uses of multiple-argument constructors efficiently.

Format strings

printf - and scanf -style format strings are built into the language. No format strings

Printf.printf "%s, %d

" "Hello" 1234;

Format strings have proved to be a useful enough idiom that special language support is justified. It would go against the SML philosophy to include such a complicated type system feature that most parts of most programs wouldn't use. It also turns out that SML is already sufficient to implement something almost identical to printf .

Labeled function arguments

Has them Doesn't have them

let name ~firstName ~lastName = firstName ^ " " ^ lastName;; name ~lastName:"Doe" ~firstName:"John";; fun name {firstName, lastName} = firstName ^ " " ^ lastName; name {lastName = "Doe", firstName = "John"};

Labeled arguments remove the need to keep track of which arguments go in which positions. Labeled arguments complicate the language definition, and their benefits can often be attained through other means. OCaml's lack of anonymous record types seems to explain much of the rationale for including labeled arguments.

Mutable fields

Record fields may be marked mutable. No concept of mutable fields

type mut_pair = {mutable x : int; mutable y : int};; let myPair = {x = 1; y = 2}; myPair.x type mut_pair = {x : int ref, y : int ref}; val myPair = {x = ref 1, y = ref 2}; #x myPair := 3;

Mutable fields make imperative programming more convenient, and they have a more natural efficient compilation strategy in the absence of dataflow analysis. ref types are a simpler feature and can be used to implement a work-alike to mutable fields. With dataflow analysis, SML compilers like MLton produce efficient binaries from code that uses ref s to implement mutable fields.

Optional function arguments

Has them Doesn't have them

let printIt ?(prefix = "Hello, ") s = print_endline (prefix ^ s);; printIt "world";; printIt ~prefix:"1" "2";; fun printIt (prefix, s) = print (Option.getOpt (prefix, "Hello, ") ^ s ^ "

"); printIt (NONE, "world"); printIt (SOME "1", "2");

Optional arguments make it easy to have highly-configurable functions that can be called succinctly in the common case. Optional arguments complicate the language definition, and their uses tend in practice to be implementable in other, not much more verbose ways.

Polymorphic variants

Has them Doesn't have them

let getNum = function `Num n -> Some n | _ -> None;; type t1 = [`Num of int | `Other of string];; getNum (`Num 6 : t1);; type t2 = [`Num of float | `Something of bool];; getNum (`Num 6.0 : t2);; datatype ('a, 'b) base = Num of 'a | Rest of 'b; type t1 = (int, string) base; getNum (Num 6 : t1); type t2 = (real, bool) base; getNum (Num 6.0 : t2);

Polymorphic variants enable greater degrees of genericity than regular variants do. Polymorphic variants can lead to some quite confusing error messages, and static checking is a less effective bug-finder in a program that uses them. Newcomers to OCaml often fall into using polymorphic variants by default, since they have a lower cost of entry than regular variants, even though most ML programmers agree that regular variants are more desirable when applicable.

Records

Declared, generative record types where field names can shadow others Anonymous record types

type coord = {x : int; y : int};; let addCoord c = c.x + c.y;; type coord' = {x : int; y : int};; addCoord {x = 1; y = 2};; (* Type error *) type unrelated = {x : float; y : bool; z : string};; let myCoord = {x = 1; y = 2};; (* Type error *) fun addCoord (c : {x : int, y : int}) = #x c + #y c; type unrelated = {x : real, y : bool, z : string}; val myCoord = {x = 1, y = 2};

By looking at just a single field of a valid record construction expression, the expression's type is uniquely determined, which makes type inference easier compared to anonymous record types. However, namespace management of fields can be arduous. To use a record type declared in another module, one must either open that module or reference a field with a full path that includes the module name. It's also easy to unintentionally shadow a field name with a new record type declaration. In general, anonymous record types are a lightweight feature that avoids the problems mentioned for OCaml's records. On the other hand, type inference for anonymous record types can be tricky, often prompting SML programmers to include type annotations on record arguments to functions or wrap record types inside single-constructor datatypes.

Recursive types

Any mutually recursive type definitions are allowed, as long as a cycle-detection algorithm accepts them. All recursion goes through algebraic datatypes.

type 'a tree = {data : 'a; children : 'a tree list};; type 'a btree = Leaf of 'a | Node of 'a forests * 'a forests and 'a forest = 'a btree list and 'a forests = 'a forest list;; datatype 'a tree = Tree of {data : 'a, children : 'a tree list}; datatype 'a btree = Leaf of 'a | Node of 'a forests * 'a forests withtype 'a forest = 'a btree list and 'a forests = 'a btree list list; (* Definition of forest must be substituted *)

OCaml features a single mutually-recursive type syntax, overloaded to cover synonyms, record types, and variants. Most reasonable definitions just work. By forcing all type recursion to go through datatype declarations, SML simplifies its formal semantics to only have to deal with recursive types "in one place," without sacrificing any expressivity.

Pattern matching

Guards

Has them Doesn't have them

let f x = function Some y when y < x -> y | _ -> 0;; fun f x = fn SOME y => if y < x then y else 0 | _ => 0;

These can be a significant code-space saver. Let's not clutter up the language definition, eh? You can always define a local function that you call in the several cases to which you must compile a single guard use.

"Or" patterns

Has them Doesn't have them

let f = function 0 | 1 -> true | _ -> false;; val f = fn 0 => true | 1 => true | _ => false;

These can be a significant code-space saver. Let's not clutter up the language definition, eh? You can always define a local function that you call in all the branches you would have lumped together with an "or" pattern. (An SML/NJ extension allows "or" patterns.)

Modules and classes

First-class functors and signatures

Allows functors that return functors or take them as arguments, functors and signatures in modules, etc. Doesn't allow these

module F (A : sig end) (B : sig end) = struct end;; module M = struct module type S = sig end end;; functor F (A : sig end) (B : sig end) = struct end; (* SML/NJ only *) (* No counterpart to second example *)

Both of these features have many nice uses. It's almost always possible to work around these omissions, and an SML/NJ extension supports higher-order functors.

Object-oriented features

Novel object system No special features

class type counter = object ('self) method get : int method set : int -> unit method inc : 'self end;; class myCounter init : counter = object val mutable count = init method get = count method set n = count } end;; let c = new myCounter 23;; c#set 42;; class type counter' = object inherit counter method zero : unit end;; class myCounter' init : counter' = object (self) inherit myCounter init method zero = self#set 0 end;; datatype counter = Counter of { get : unit -> int, set : int -> unit, inc : unit -> counter }; (* Notice that inc's type only reflects that _some_ * counter is returned, not necessarily "the same * type of" counter. *) fun myCounter init = let val count = ref init in Counter {get = fn () => !count, set = fn n => count := n, inc = fn () => myCounter (!count + 1) } end; val c = myCounter 23; case c of Counter {set, ...} => set 42; datatype counter' = Counter' of { get : unit -> int, set : int -> unit, inc : unit -> counter', zero : unit -> unit }; fun myCounter' init = let val Counter {get, set, inc} = myCounter init in Counter' {get = get, set = set, inc = fn () => myCounter' (get () + 1), zero = fn () => set 0 } end;

For some situations, objects are the clear right implementation technique, and OCaml makes them convenient to use. Most of the individual features that go into a typical concept of "object orientation" are available separately in core ML. The main omissions are succinct mechanisms to implement inheritance and self types. As far as education and training go, lack of "OO" features in SML can be a blessing, since new OCaml programmers often latch onto the object system as the default means of abstraction, missing oftentimes more appropriate features in the module system and elsewhere.

open in signatures

open is allowed in signatures. open is not allowed in signatures.

module type S = sig open M val x : t end;; signature S = sig val x : M.t end;

This can save a lot of time in defining signatures that use many types defined elsewhere. Just another feature to avoid having to include in the formal language definition!

Separate compilation conventions

Filenames imply module names No standard separate compilation scheme

When using OCaml as a compiler, every .ml file is treated as a separate module, with its interface optionally given by a corresponding .mli file. This usually works well, but there are some problems. First, the "signatures" defined by .mli files aren't first-class module system signatures, so they can't be referenced anywhere else. This means that multiple source-file-level modules can't share a signature, and that signatures must always live inside of modules. Of course, one could always put all of his modules in a single file and avoid this problem, but splitting into files is a standard technique for facilitating good interaction with editors, source control, and so on. There is an analogous problem for functors, leading to examples like Map.Make in the standard library, as opposed to the work-alike BinaryMapFn in the SML/NJ library. Candidate tools related to SML separate compilation system and project management include the SML/NJ Compilation Manager and the MLton ML Basis system. Both of these are essentially file agnostic after all of the appropriate files have been found (and possibly assigned some compilation flags), effectively concatenating the files together and imposing visibility restrictions at the module level.

Tools

Build system

Command-line tools with help generating dependency information SML/NJ's Compilation Manager; whole-program compilation with MLton

OCaml integrates well into traditional UNIX build systems. The ocamldep program builds dependency information for use by Makefiles, and the popular (but separately-distributed) OCamlMakefile ties it all together. SML/NJ's Compilation Manager makes it extremely easy and convenient to compile projects that don't use much non-SML code, including integrated support for ml-lex/yacc. It also has some namespace management features for building and packaging libraries. Build management issues aren't that big a deal for MLton, which must compile whole programs at once, anyway.

Bytecode compiler

Included with the main distribution Present in some systems, including Moscow ML and the ML Kit

Compilation to .NET

F# starts with OCaml, drops the object system and some other features (like parts of the module system), and adds .NET-style OO that interoperates seamlessly with other .NET languages. There is Visual Studio support as well. Lately, there's been a lot of hype surrounding F#. SML.NET matches most of the main advantages of F# but doesn't seem to end up with as many neat experimental features. On the other hand, unlike F# for OCaml, it implements all of Standard ML. It also somehow can't match F#'s hype level.

Compilation to Java bytecode

.NET seems to be the managed platform of choice these days for functional programmers, and interest has shifted from MLj to SML.NET.

Debugger

Backtracking debugger Only Poly/ML includes a mature debugger.

OCaml features a debugger in the tradition of GDB, plus some novel features like backtracking. What, you're still using a debugger instead of unit tests? ;-)

Extensible parsing

camlp4 extensible grammar tool integrates with OCaml compilation No equivalent

camlp4 allows dynamic extension of grammars by adding new non-terminals, in contrast to well-known "macro systems" based around tokens or s-expressions. Not only does camlp4 integrate with the OCaml compiler, allowing language extension along the lines of traditional macro uses, but it can also be used separately. For instance, Coq uses campl4 to let users add new commands and tactics implemented in OCaml. General objections about macros apply: we would like to keep it as simple as possible for both humans and programs to parse arbitrary SML programs, making it undesirable to allow customized grammar extensions. Most common uses of macros in C and Lisp are better handled with other SML features.

Emacs modes

Foreign function interfaces

Lightweight FFI supported by added language constructs Semi-standardized No Longer Foreign Function Interface, plus MLton's lower-level FFI for interfacing with C code directly

The FFI is relatively simple to use once you figure out linking, but the C-level view is one of specialized OCaml types instead of "native C" types. This can make it cumbersome to interface with existing C code. Also, the programmer needs to write the correct types for external functions manually. The NLFFI embeds most of the C type system in SML, letting the SML compiler type-check appropriate usage and catch many nasty classes of bugs. The ml-nlffigen program builds SML wrappers automatically from C header files. The NLFFI tools are available for both SML/NJ and MLton, making it largely seamless to build an FFI-using project with both. The tools take care of adapting to the compilers' different "under the hood" conventions. See also MLton's lower-level FFI.

HTML documentation generation from source code

ocamldoc Some tool is available and used to generate the Standard Basis documentation, but where is it?

Evaluation pending more information on SML tools

Optimizing native-code compiler

Native code compiler does very little program analysis Whole-program optimizing compiler (MLton)

OCaml has gotten by quite well by choosing an efficient base compilation strategy. Development focus seems to be on adding new language features instead of improving compilation. MLton is one of the best open-source optimizing compilers available. The Computer Language Shootout has it in 7th place currently for execution speed, behind D, C, C++, Eiffel, Pascal, and Ada compilers and just ahead of OCaml. The Shootout only measures microbenchmarks, and MLton's whole-program optimization can be expected to produce a marked efficiency advantage over native OCaml programs for large projects that make good use of abstraction and modularity.

Parser generators

Performance profiling

Direct profiling of execution counts for bytecode programs and indirect gprof-based profiling of time for native code programs In MLton, direct profiling of execution counts, time, and allocation for native code programs

See ocamlprof documentation. See MLton profiling documentation.

Source/interface browser

ocamlbrowser No equivalent

ocamlbrowser provides a specialized GUI for navigating through OCaml code. It's unclear whether many people find ocamlbrowser significantly more helpful than Emacs plus highly-hyperlinked HTML documentation.

Standard library

Standard library plus several other libraries packaged with the main distribution The Standard Basis plus the SML/NJ library

Taking the SML/NJ library into account since it is now distributed with MLton as well as SML/NJ, there is no clear winner in standard library coverage between OCaml and SML. Each has all the basics, as well as some gems that the other lacks.

Toplevel interactive system

Present Present in all SML compilers but MLton

Social factors

Community contributions to implementations

Contributions to the OCaml implementation are tightly regulated, and patches are often rejected. Generally open community approach. MLton's Subversion repository allows commits by any well-established community member who asks for permission.

This aspect of OCaml philosophy seems oriented towards research projects and makes it hard to take advantage of contributions from well-meaning hackers outside the project. ML programmers on average are quite knowledgeable and skilled at development, so it is advantageous to tap the whole community in developing implementations and standard distributions.

Cute logos

Caml has the camel: Nothing worth mentioning!

Historical roots

Historical association with theorem proving tools based on type theory and proof terms Historical association with theorem proving tools in the LCF tradition

Caml was developed to use in implementing the Coq theorem prover, as related in this account. The ML family in general owes its origins to the LCF system. Today, SML is associated with successors like Isabelle.

Implementation diversity

Language design

Ad-hoc process similar to most "open source programming languages" Language definition with formal semantics

OCaml picks up new features agilely, without any heavyweight standardization or formalization process needed for the entirety of the revised language before a release is made. The language is in effect defined by some combination of the manual and the implementation. The existence of a language definition helps language implementers keep in sync and discourages feature bloat. The formal semantics provides a concrete starting point for formal methods. On the other hand, these aspects discourage the adoption of new language features that the community might agree on as worthwhile. The new Successor ML project aims to overcome this stagnation, hopefully using more agile processes in the long term.

Learning materials

Library availability

Relatively many libraries available Relatively few libraries available

OCaml has had the fundamental tools in place to be regarded as a "serious programming language" for longer than SML has, and its significantly greater number of freely available libraries today reflects that. The Caml Hump collects links to these libraries in one place. SML has historically been more the domain of scientists with narrow interests in programming languages and formal methods, and so fewer libraries are available in general. However, particularly in the MLton community, this deficiency is recognized, and directed efforts are underway both to draft an enriched "standard library" and to create useful special-purpose libraries.

Packaging

Pre-built packages available for Debian Linux and many other UNIX-like systems. MLton and SML/NJ packages available for Debian.