"The Whitespace Thing" for OCaml by Mike Lin

"The Whitespace Thing" is an alternative syntax for OCaml that uses indentation to group multi-line expressions, like Python and Haskell. This is a controversial feature that some people will always love and some people will always hate. Using pretty much the same indentation patterns you put in your code anyway, "The Whitespace Thing" eliminates:

The ; operator for sequences of expressions

operator for sequences of expressions Multi-line parenthesizations in nested function applications

Ambiguity involving nested let , if-else , and try-with expressions, and associated parenthesization

, , and expressions, and associated parenthesization The parenthetical keywords in , done , end , and begin

, , , and The heinous ;; toplevel statement operator

The syntax is otherwise the same as OCaml, with a few restrictions.

Version 1 is implemented as a line-oriented preprocessor, invoked as ocaml+twt. This is something of a hack. At some unspecified time in the future, Version 2 should be written as a camlp4 syntax (although this promises to be difficult).

Distribution

twt-0.93.tar.gz

The current version is 0.93, released on 2012-02-01. It works very well: I have used it exclusively for several years to implement computational biology algorithms and bioinformatics tools. However, in these travels I probably have not rigorously tested all pathological cases of syntax involving objects, modules, and functors.

The software is distributed under the MIT license.

Previous versions

Note: version 0.90 is backwards-incompatible with previous versions. Older versions still required you to use in with let and indent the let body.

Documentation

To install the preprocessor, run make install in the source tree. By default, this tries to install the executable in the same directory as ocamlc . Use make INSTALLDIR=/some/path install to override this.

To use the preprocessor, either manually invoke it using ocaml+twt mycode.ml and pipe the results to a file, or use the preprocessor flag to ocamlc :

ocamlc -pp ocaml+twt mycode.ml

There are a few options available for the preprocessor. They are pretty self-explanatory by looking at the usage printed by invoking ocaml+twt .

With ocamlbuild , you can have something like this in the _tags file in your project directory:

or : ocaml, pp(ocaml+twt), debug

If you use OCamlMakefile, you can make the first line of your file (*pp ocaml+twt *) in order to have it preprocessed.

Some camlp4 extensions can be used with ocaml+twt , by applying camlp4 to the output of ocaml+twt . There is a utility, ppcompose , included in the distribution to assist with this (see the README file).

Quick reference

Here is a handy quick reference that demonstrates most common syntax forms recognized by the preprocessor. The LaTeX source for this is included in the distribution. There are also several example programs included in the examples subdirectory.

Code examples

ocaml ocaml+twt let rec main magic_number = Printf.printf "Your guess? "; let guess = int_of_string (read_line ()) in if guess > magic_number then (Printf.printf "Too high!

"; main magic_number) else if guess < magic_number then (Printf.printf "Too low!

"; main magic_number) else (Printf.printf "You win!

"; exit 0);; Random.self_init ();; main (Random.int 100);; let rec main magic_number = Printf.printf "Your guess? " let guess = int_of_string (read_line ()) if guess > magic_number then Printf.printf "Too high!

" main magic_number else if guess < magic_number then Printf.printf "Too low!

" main magic_number else Printf.printf "You win!

" exit 0 Random.self_init () main (Random.int 100) let list_out lst = (List.map (function Some x -> x) (List.filter (function Some x -> true | None -> false) lst)) let list_out lst = List.map function Some x -> x List.filter function Some x -> true | None -> false lst for i = 1 to 10 do print_int i; print_newline () done; print_string "done" for i = 1 to 10 do print_int i print_newline () print_string "done" let contrived = function s when (String.length s) > 0 -> begin try Some (float_of_string s) with Failure _ -> Some nan end | _ -> None let contrived = function | s when (String.length s) > 0 -> try Some (float_of_string s) with | Failure _ -> Some nan | _ -> None

More substantial examples can be found in the examples subdirectory of the source tree.

Tips and FAQs

This mostly covers things for which there was not enough space in the quick reference:

Parentheses: If you for some reason write a multi-line parenthesized expression, the preprocessor will ignore everything inside the parentheses, including newlines. Thus, any complicated sub-expressions also need to be parenthesized, regardless of how they are indented. Occasionally, this is useful as a workaround if the preprocessor doesn't quite handle something right.

If you for some reason write a multi-line parenthesized expression, the preprocessor will ignore everything inside the parentheses, including newlines. Thus, any complicated sub-expressions also need to be parenthesized, regardless of how they are indented. Occasionally, this is useful as a workaround if the preprocessor doesn't quite handle something right. Line numbers: The syntax transform has the property that it does not add or remove any lines of code. Thus, line numbers given in ocamlc semantic errors should be correct. The tradeoff for this is that the postprocessed code is somewhat unreadable, although you could always run it through the camlp4 pretty-printer.

The syntax transform has the property that it does not add or remove any lines of code. Thus, line numbers given in semantic errors should be correct. The tradeoff for this is that the postprocessed code is somewhat unreadable, although you could always run it through the pretty-printer. Performance: The syntax transform does not add any expressions or statements, only parentheses. Thus, there should be no performance impact in the final product.

The syntax transform does not add any expressions or statements, only parentheses. Thus, there should be no performance impact in the final product. Comments: cannot occur (or terminate) at the beginning of a line that also has code on it.

cannot occur (or terminate) at the beginning of a line that also has code on it. ocamldoc: You should be able to comment things as usual and run ocamldoc on the postprocessed code. But I haven't tested this extensively.

You should be able to comment things as usual and run ocamldoc on the postprocessed code. But I haven't tested this extensively. Toplevel: No support and none likely. Sorry.

No support and none likely. Sorry. Emacs/VIM/etc. modes: use with extreme caution; this is a new syntax, after all. See below for an experimental emacs mode. Nathaniel Gray's syntax highlighting patterns for NEdit seem to work great with ocaml+twt.

use with extreme caution; this is a new syntax, after all. See below for an experimental emacs mode. Nathaniel Gray's syntax highlighting patterns for NEdit seem to work great with ocaml+twt. Pattern matching: If the consequence of a pattern is a sequence of statements, make sure to place them either all on one line (separated by ; ) or entirely in their own block. That is:

instead of... do... match n with | 1 -> print_string "one" print_endline () match n with | 1 -> print_string "one" print_endline ()

Of course, if the consequent is just a single expression, you can place it on the same line. This restriction is actually true almost everywhere, such as let bodies and if-then consequents; see the quick reference.

If the consequence of a pattern is a sequence of statements, make sure to place them either all on one line (separated by ) or entirely in their own block. That is: Of course, if the consequent is just a single expression, you can place it on the same line. This restriction is actually true almost everywhere, such as let bodies and if-then consequents; see the quick reference. Applications: In multi-line applications, if the function being applied is some complicated expression (rather than an identifier), you must parenthesize it.

instead of... do... if b then (+) else (-) x y (if b then (+) else (-)) x y function | x when x >= 0 -> (+) | _ -> (-) x y (function | x when x >= 0 -> (+) | _ -> (-)) x y

Using ocaml+twt with emacs

Personally, I just use Fundamental mode with the following in my .emacs:

(global-set-key (quote [S-iso-lefttab]) (quote indent-relative-maybe))

This binds Shift-Tab to insert whitespace to match the indentation of the previous line.

For something fancier, Till Varoquaux has contributed caml+twt.el, which is still experimental. Thanks to Till! Here is more information:

I did a quick hack to Tuareg to get indentation working in python-mode like way. You will find the el file here enclosed. To autoload I use the following (warning to lisp lovers: this is very ugly, I'm just getting started with elisp). (autoload 'tuareg-mode "tuareg" "Major mode for editing Caml code" t) (autoload 'caml+twt-mode "caml+twt" "Major mode for editing Caml+twt code" t) (defun start-mlmode () (when (save-excursion (progn (goto-char (point-min)) (looking-at "(\\*pp ocaml\\+twt\\*)[:blank:]*") ) ) (caml+twt-mode) ;;(tuareg-mode) ) (remove-hook 'find-file-hook 'start-mlmode 1) ) (add-hook 'tuareg-load-hook ( lambda ()(add-hook 'find-file-hook 'start-mlmode 1)) Which will switch over to caml+twt mode on opening a file with a .ml extension only if the first line is: (*pp ocaml+twt*) (this is consistent with OCamlMakefile). Syntax highlighting of comments doesn't work anymore. Hope this turns out usefull to someone. Till

Useful links

Understanding GNU Emacs and Tabs

Python: Myths about Indentation - the information on this page mostly applies to ocaml+twt as well.

F# lightweight syntax, a similar idea for the OCaml-derived language for .NET. The lightweight syntax seems to be much more popular than the "normal" syntax among F# users. (For the record, ocaml+twt predated this by about nine months. I don't know if there was any causal relationship.)

OCaml Programmers Against Parentheses

E-mail me if you want to call yourself a member. Actually, you don't have to e-mail me. In all likelihood, the only purpose of this club will be to get into flamewars.