Automated reasoning in F#, Scala, Haskell, C++, and Julia

2015.04.05

We need to simplify the following expression:

\[e = (1 + 0 \times x) \times 3 + 12.\]

Luckily for us, we won't have to remember any elementary school arithmetic, because Harris' excellent Handbook of Practical Logic and Automated Reasoning begins with a simple algorithm to do exactly that. It's not complicated, but it's a pretty good barometer of how painful a programming language will be for the kind of hybrid (probabilistic logic, or statistical relational) approaches I work with. Here, I compare the implementations of Harris' simple algorithm in F#, Scala, Haskell, C++, and Julia.

No programming languages were hurt while writing this post. It's not a competition, and I avoided talking about languages I dislike. Sum types are discussed at length because they are awesome and useful for this problem (and many, many others).

The ML family

Harris' book uses OCaml, a popular language for solvers. F#, Haskell, and Scala all share roots with OCaml, with F# being the closest thing to an OCaml dialect. I'll start with F#:

/// A sum type for the expression. /// An expression is either a var (which is a string), a constant /// (which is an integer), an addition (made of two expressions) /// or a multiplication (also made of two expressions). type Expr = | Var of string | Const of int | Add of Expr * Expr | Mul of Expr * Expr /// Simplify a single component of the expression. This function /// takes an expression and use pattern matching to select the /// right approach based on type and value. For example, if we /// add a constant 0 to some x (which can be expression), then /// we return x. let simplify1 e = match e with | Add (Const 0, x) | Add (x, Const 0) | Mul (x, Const 1) | Mul (Const 1, x) -> x | Mul (x, Const 0) | Mul (Const 0, x) -> Const 0 | Add (Const a, Const b) -> Const (a + b) | Mul (Const a, Const b) -> Const (a * b) | _ -> e /// Recursive function to simplify an entire expression. let rec simplify e = match e with | Add (x, y) -> Add (simplify x, simplify y) | Mul (x, y) -> Mul (simplify x, simplify y) | _ -> e |> simplify1 /// Return the value string if the expression can be reduced to a constant. let exprStr e = match e with | Const x -> string x | _ -> "The expression could not be simplified to a constant." /// The |> operator sends the result on its left to its right, for example /// "5.0 |> log |> sqrt" computes log(5.0) and then the square root of the /// result. This is nice because it allows a more natural left-to-right /// flow for functional programming. [<EntryPoint>] let main argv = Add (Mul (Add (Const 1, Mul (Const 0, Var "x")), Const 3), Const 12) |> simplify |> exprStr |> printf "%s" 0 /// F#'s main returns 0 for success à la C

It's almost the same as the OCaml version in Harris' book. The key trick is to define an expression (Expr) as a variable (string) or a constant (integer) or an addition or a multiplication (both made of two expressions). The or is important, object-oriented programming languages focuses on hierarchies of objects, while sum types define a type as a series of alternatives. Sum types are important for another reason: they provide an easy way to express things like "this function might return an integer", for example in Haskell if we want a data structure that maps keys to values:

import Data.Map (Map) import qualified Data.Map as Map capitals = Map.fromList [("Finland", "Helsinki"), ("France", "Paris"), ("Japan", "Tokyo"), ("South Korea", "Seoul"), ("Arrakis", "Arrakeen")] lookupCapitals country = case Map.lookup country capitals of Just capital -> "The capital of " ++ country ++ " is " ++ capital ++ "." Nothing -> "Is " ++ country ++ " even a country?"

The point is that a key-value store will only return a value if the key is present. In this example, the map takes a country (string) and returns its capital (string). However, when we try to take a value from the map with the lookup function, Haskell returns a Maybe type with either Just String, if the string provided is found in the map, or Nothing if the key is absent. We then use pattern matching to deal with these possibilities in the lookupCapitals function. One of the most common mistake in programming is to return a null and not deal with it properly. The solution with sum types is to return a wrapped value and handling possibilities explicitly with pattern matching. It solves with types what many languages would solve with exceptions and try-catch apparatuses.

ghci> lookupCapitals "Arrakis" "The capital of Arrakis is Arrakeen." ghci> lookupCapitals "Canada" "Is Canada even a country?"

Speaking of Haskell, the code for the algorithm is:

data Expr = Var String | Const Int | Add Expr Expr | Mult Expr Expr simplify1 :: Expr -> Expr simplify1 e = case e of Add (Const 0) x -> x Add x (Const 0) -> x Add (Const a) (Const b) -> Const $ a + b Mult x (Const 0) -> Const 0 Mult (Const 0) x -> Const 0 Mult x (Const 1) -> x Mult (Const 1) x -> x Mult (Const a) (Const b) -> Const $ a * b _ -> e simplify :: Expr -> Expr simplify e = case e of Add x y -> simplify1 $ Add (simplify x) (simplify y) Mult x y -> simplify1 $ Mult (simplify x) (simplify y) _ -> simplify1 e e = Add (Mult (Add (Const 1) (Mult (Const 0) (Var "x"))) (Const 3)) (Const 12) s = simplify e main = putStrLn $ case s of Const x -> show x _ -> "Could not simplify the expression to a constant."

It's quite similar to F#. I decided to add types explicitly to simplify1 and simplify, but Haskell is smart enough to deduce the type without this. Arguably the only thing worth explaining is the $ operator. The operator forces Haskell to evaluate the expression to the right of the operator in priority, and if it reminds you of parentheses, you are absolutely right. x and y have the same value here:

x = log (sqrt (exp 5.0)) y = log $ sqrt $ exp 5.0

The operator is there to reduce visual clutter. In my opinion, F# is easier to read because the |> operator enforces left-to-right reading, which is more natural than reading code inside-out:

let z = exp 5.0 |> sqrt |> log

Although it's trivial to simulate this operator in Haskell:

(|>) :: t0 -> (t0 -> t1) -> t1 (|>) x f = f x -- Now valid Haskell: z = exp 5.0 |> sqrt |> log

And now for something a bit different: Scala. It's also a static functional programming language with sum types, but its greater integration with the object-oriented paradigm is evident:

object Simplify { sealed abstract class Expr { override def toString = show(this) } case class Variable(name: String) extends Expr case class Const(value: Int) extends Expr case class Add(left: Expr, right: Expr) extends Expr case class Mult(left: Expr, right: Expr) extends Expr def evalOne(e: Expr): Expr = e match { case Add(Const(0), r) => r case Add(l, Const(0)) => l case Add(Const(a), Const(b)) => Const(a + b) case Mult(Const(0), r) => Const(0) case Mult(l, Const(0)) => Const(0) case Mult(Const(1), r) => r case Mult(l, Const(1)) => l case Mult(Const(a), Const(b)) => Const(a * b) case _ => e } def eval(e: Expr): Expr = e match { case Add(l, r) => evalOne(Add(eval(l), eval(r))) case Mult(l, r) => evalOne(Mult(eval(l), eval(r))) case _ => e } def show(e: Expr) = e match { case Const(x) => print(x) case _ => print("The expression could not be simplified to a constant.") } def main(args: Array[String]) { var e = Add(Mult(Add(Const(1), Mult(Const(0), Variable("x"))), Const(3)), Const(12)) var s = eval(e) print(s) } }

Everything is an object in Scala. Thus, we have to define the functions to simplify as methods inside a singleton object. I named the functions evalOne and eval since it has a bit odd to have a function named simplify inside a Simplify object.

C++

Few understand every corner of C++'s monstrous standard. It's huge. Surely, with so many features, there must something to solve this simple problem cleanly. Well... no. It's a well-known lacuna with C++, see Open and Efficient Type Switch for C++ for a library built to implement pattern matching (the effort is directed by the creator of the C++ language). That said, here I'll use the boost library (A) because solutions based only on the standard library are contrived and (B) because boost is almost standard, and I don't want to rely on third-party libraries.

#include <iostream> #include <string> #include <boost/variant.hpp> // Forward declarations: struct add; struct mult; // The variant for the expression: using expr = boost::variant< int, std::string, boost::recursive_wrapper<add>, boost::recursive_wrapper<mult>>; // A base class for all binary operations: class binary_op { expr m_left, m_right; public: // Builds a binary operation from left and right expressions. binary_op(expr const& left, expr const& right) : m_left(left), m_right(right) { } // Returns the expression to the left side of the expression. auto left() const -> expr const& { return m_left; } // Returns the expression to the right side of the expression. auto right() const -> expr const& { return m_right; } }; // Defines the 'add' operator. struct add : public binary_op { add(expr const& left, expr const &right) : binary_op(left, right) { } }; // Defines the 'mult' operator. struct mult : public binary_op { mult(expr const& left, expr const &right) : binary_op(left, right) { } }; // Overload * and + to simplify creating expressions. auto operator+(expr const& lhs, expr const& rhs) -> expr { return add{lhs, rhs}; } auto operator*(expr const& lhs, expr const& rhs) -> expr { return mult{lhs, rhs}; } // A visitor for addition. struct add_visit : public boost::static_visitor<expr> { auto operator()(int lhs, int rhs) const -> expr { return expr{lhs + rhs}; } template<typename R> auto operator()(int lhs, R const& rhs) const -> expr { return lhs == 0? expr{rhs} : add{expr{lhs}, expr{rhs}}; } template<typename L> auto operator()(L const& lhs, int rhs) const -> expr{ return rhs == 0? expr{lhs} : add{expr{lhs}, expr{rhs}}; } template<typename L, typename R> auto operator()(L const& lhs, R const& rhs) const -> expr { return add{expr{lhs}, expr{rhs}}; } }; // A visitor for multiplication. struct mul_visit : public boost::static_visitor<expr> { auto operator()(int lhs, int rhs) const -> expr { return expr{lhs * rhs}; } template<typename R> auto operator()(int lhs, R const& rhs) const -> expr { return lhs == 0? expr{0} : (lhs == 1? expr{rhs} : mult{expr{lhs}, expr{rhs}}); } template<typename L> auto operator()(L const& lhs, int rhs) const -> expr { return rhs == 0? expr{0} : (rhs == 1? expr{lhs} : mult{expr{rhs}, expr{lhs}}); } template<typename L, typename R> auto operator()(L const& lhs, R const& rhs) const -> expr { return mult{expr{lhs}, expr{rhs}}; } }; struct simplify1 : public boost::static_visitor<expr> { auto operator()(add const& a) const -> expr { return boost::apply_visitor(add_visit{}, a.left(), a.right()); } auto operator()(mult const& m) const -> expr { return boost::apply_visitor(mul_visit{}, m.left(), m.right()); } template<typename T> auto operator()(T const& t) const -> expr { return expr{t}; } }; struct simplify : public boost::static_visitor<expr> { auto operator()(add const& a) const -> expr { auto left = boost::apply_visitor(simplify{}, a.left()); auto right = boost::apply_visitor(simplify{}, a.right()); auto add_lr = boost::apply_visitor(add_visit{}, left, right); return boost::apply_visitor(simplify1{}, add_lr); } auto operator()(mult const& m) const -> expr { auto left = boost::apply_visitor(simplify{}, m.left()); auto right = boost::apply_visitor(simplify{}, m.right()); auto mul_lr = boost::apply_visitor(mul_visit{}, left, right); return boost::apply_visitor(simplify1{}, mul_lr); } template<typename T> auto operator()(T const& t) const -> expr { return expr{t}; } }; struct print_expr : public boost::static_visitor<std::string> { auto operator()(int n) const -> std::string { return std::to_string(n); }; auto operator()(expr const&) const -> std::string { return "The expression could not be simplified to a constant."; }; }; auto main() -> int { const auto e = (expr{1} + expr{0} * expr{"x"}) * expr{3} + expr{12}; const auto s = boost::apply_visitor(simplify{}, e); std::cout << boost::apply_visitor(print_expr{}, s) << std::endl; return 0; }

This is boost::variant in action. My biggest qualm with this type of clever header-heavy code is that you get to see a big chunk of the developers' lifework unroll before your eyes every time a small mistake is made. Otherwise it's an OK substitute for proper sum types/pattern matching. If you want to know how this code works, you need to read a bit on the visitor pattern.

Julia

Julia is an attempt to build a fast and flexible replacement for R/Python/Matlab. An issue with most dynamic languages is that there is no elegant way to switch on type. To be fair, you cannot really do it with most static languages either, see previous section... However, Julia supports multiple-dispatch based on type annotation. To be clear, it's quite different from the F#/Scala/Haskell approach. In these languages, it is possible to define sum types and do pattern matching on their constructors. With Julia, we define a function with type annotation and let the interpreter dispatch based on runtime type information. Multiple dispatch is supported in Julia for performance: it allows the interpreter to compile optimized functions and use the best one, adding predictability while keeping the language dynamic (for some reason...). Here's the algorithm in Julia:

abstract Expr type Const

Unlike pattern matching, we can only dispatch on type, so we need an if expression (the ? operator in Julia, just like C), or I could've used the match macro, but it's overkill here. It's not too inelegant, and at first I thought it was a good enough way to simulate sum types and pattern matching. It matters to the Julia ecosystem because these features are very useful to build solvers, logic and theorem proving systems, etc etc. Pretty nice for a technical computing platform. Unfortunately, while Julia does well with this simple example, I think an oddity with the language would soon bite us: return type declarations are not allowed, and yes, it is a big deal.

First, it's a question of correctness: you can return a float thinking you're returning an integer. That's awful. Also, since annotation is not allowed for the return value, it's also impossible to add annotation for a higher-order function (a function taking functions as input). As a concrete example, first-order logic has predicates mapping objects to a boolean, and functions mapping objects to objects. We'd like to do:

solve(pre::(Object -> Bool), ...) solve(fun::(Object -> Object), ...)

But instead, we'd have to test the type of the return value inside the function. That said, Julia is young and it might get return type declarations at some point.

Conclusion

Sum types and pattern matching are awesome.