Solving the expression problem in Python (object algebras and mypy static types) Posted on July 15, 2015

Object algebras are a solution to the expression problem for any object oriented languages that supports generics. This post will introduce object algebras, the expression problem, and then implement a solution to the expression problem in Python with mypy, an optional static type checker for Python that’s inspired some standardisation on type hinting in the form of PEP 0484.

The expression problem

The expression probem is a “new name for an old problem”. It was originally described by Phillip Wadler in an email to a java genericity mailing list on Nov 12, 1998:

The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).

The requirements to be able to say that you have solved the expression problem are more concretely described in the paper “Extensibility for the Masses”:

Extensibility in both dimensions: A solution must allow the addition of new data variants and new operations and support extending existing operations. Strong static type safety: A solution must prevent applying an operation to a data variant which it cannot handle using static checks. No modification or duplication: Existing code must not be modified nor duplicated. Separate compilation and type-checking: Safety checks or compilation steps must not be deferred until link or runtime. Independent extensibility: It should be possible to combine independently developed extensions so that they can be used jointly.

It’s a little hard to satisfy the static safety checks in Python without adding types that can be checked statically (without executing the code). To address this shortcoming, we will use the excellent mypy type checker.

Static type checking with mypy

Mypy is a static type checker for Python. It’s got support for generics and an analog of virtual methods. This will be enough to do some interesting things, as we will soon see.

An important feature to note is that every mypy program is still a valid python 3.x program. The types are entirely optional and you’re free to run the program if it doesn’t type check, it will probably just explode at runtime. Let’s look at an example, this is a normal python function:

def greeting(name): return 'Hello, {}' . format (name)

Below is the same example, but now we’re going to annotate it to say “this is a function that takes a string, and returns a string”. If we try to pass something that is not a string to this function, mypy will be able to tell you that this won’t work before you run your program.

def greeting(name: str ) -> str : return 'Hello, {}' . format (name)

We can do more interesting things too, including checking subtyping relations. Here, we say “names is anything that is Iterable over, and produces a string on each iteration.”

from typing import Iterable def greet_all(names: Iterable[ str ]) -> None : for name in names: print ( 'Hello, {}' . format (name))

“Iterable” here is an example of a higher kinded type, which you can think of as a function from a type to another type. Here we are giving it the str type and the result is a type that denotes things iterable over strings. More on this in a bit, when we define our own generics.

Object algebras

Object algebras are the solution to the expression problem chosen in the paper “Extensibility for the Masses”. This is the recipe we will follow to try and solve the expression problem in Python.

Interestingly, object algebras map more or less directly to the final tagless solution to the expression problem, which is another well-known solution popularised by the paper “Finally Tagless, Partially Evaluated: Tagless Staged Interpreters for Simpler Typed Languages”.

This mapping is a desirable quality, and much of the “Extensibility for the Masses” paper is centered around explaining the nice algebraic properties you are able to get from such an approach. Object algebras are essentially F-algebras, which have been very well studied in category theory and computer science.

Object algebras are actually pretty simple, and best explained by example. If you know what the Visitor and AbstractFactory “patterns” are, you may begin to see some similarities.

Abstract base classes in Python

For our working example, we’re going to implement a simple calculator. This recurring example is sometimes known as “Hutton’s Razor”. The calculator is capable of addition on literals, so we’re able to have “1+1”, “2+1+1”, but not anything non-sensical like “42++”.

We are going to start by implementing the abstract base class (a Python 2.6 thing, PEP 3119. This base class will describe our calculator data type, and allow us to create multiple ways consuming the information therein.

from abc import ABCMeta, abstractmethod class AddExpr(metaclass=ABCMeta): @abstractmethod def lit( self , lit): pass @abstractmethod def add( self , e1, e2): pass

Now we will sprinkle some types on. Note that we introduce a type variable, this is where the “str” type lived in our earlier example with iterators.

from abc import ABCMeta, abstractmethod from typing import TypeVar, Generic T = TypeVar( 'T' ) class AddExpr(Generic[T], metaclass=ABCMeta): @abstractmethod def lit( self , lit: int ) -> T: pass @abstractmethod def add( self , e1: T, e2: T) -> T: pass

We now have an interface representing some data. This is equivalent in Haskell to:

class AddExpr t where lit :: Int -> t add :: t -> t -> t

We can now write a python expression that says “given a thing which looks like AddExpr, I’ll give you some kind of calculation”:

def expression(t: AddExpr[T]) -> T: return t.add(t.lit( 1 ), t.lit( 2 ))

This little a pretty direct mapping to the final tagless approach in Haskell:

expression :: AddExpr t => t expression = add (lit 1 ) (lit 2 )

Doing something (kind of) usefull

This is a contrived example, so let’s pretend that evaluating these abstract expressions to a resulting integer is something useful that we want to do. This is now as simple as implementing a subclass of AddExpr.

class AddExprEval(AddExpr[ int ]): def lit( self , lit: int ) -> int : return lit def add( self , e1: int , e2: int ): return e1 + e2

That’s actually pretty clean! Now we can see how the abstract base class is useful as we define another implementation, but this time for printing expressions as strings.

class AddExprPrint(AddExpr[ str ]): def lit( self , lit: int ) -> str : return str (lit) def add( self , e1: str , e2: str ): return "( {0} + {1} )" . format (e1, e2)

We can now put all of our pieces together and evaluate the same expression with two different implementations:

print (expression(AddExprPrint())) print (expression(AddExprEval()))

$ mypy object-alg.py $ python object-alg.py (1 + 2) 3

Tick in the static safety box

If you recall our prerequisites for solving the expression problem, along with extensibility we had some requirements for static safety. Let’s see if we satisfy those:

Strong static type safety: A solution must prevent applying an operation to a data variant which it cannot handle using static checks. Separate compilation and type-checking: Safety checks or compilation steps must not be deferred until link or runtime.

Let’s try to run a program that passes an implementation which cannot handle a data variant, and see if we can catch that before running the program and failing at runtime.

class AddExprIncomplete(AddExpr[ str ]): def lit( self , lit: int ) -> str : return str (lit) # Missing an implementation for add print (expression(AddExprIncomplete()))

$ mypy object-alg.py object-alg.py, line 38: Cannot instantiate abstract class 'AddExprIncomplete' with abstract method 'add

Excellent, it exploded. We’ll repeat this test when we do some extending.

Extensibility

The whole point of the expression problem is to be able to “add new cases to the datatype and new functions over the datatype”. Let’s do that now by extending our calculator to handle multiplication. We will use inheritence and subtyping to get code re-use and satisfy requirements 1, 3 and 5.

Extensibility in both dimensions: A solution must allow the addition of new data variants and new operations and support extending existing operations. No modification or duplication: Existing code must not be modified nor duplicated. Independent extensibility: It should be possible to combine independently developed extensions so that they can be used jointly.

class MultExpr(Generic[T], AddExpr[T], metaclass=ABCMeta): @abstractmethod def mult( self , e1: T, e2: T) -> T: pass # We get lit and add for free from the AddExpr[T] super(meta)class

You’ll note that as the MultExpr implementations are a superset of the AddExpr ones, we’re able to use them on both types of expression, and even embed them in our extended data type.

class MultExprPrint(AddExprPrint, MultExpr[ str ]): def mult( self , e1: str , e2: str ): return "( {0} x {1} )" . format (e1, e2) class MultExprEval(AddExprEval, MultExpr[ int ]): def mult( self , e1: int , e2: int ): return e1 * e2 def add_expr(t: AddExpr[T]) -> T: return t.add(t.lit( 1 ), t.lit( 2 )) def mult_expr(t: MultExpr[T]) -> T: return t.mult(add_expr(t), t.lit( 2 )) print (mult_expr(MultExprPrint())) print (mult_expr(MultExprEval()))

$ mypy object-alg.py $ python object-alg.py ((1 + 2) x 2) 6

What about the static safety? Do we still get that? Let’s try evaluating a multiplication expression with something less powerful.

print (mult_expr(AddExprEval()))

$ mypy object-alg.py object-alg.py, line 61: Argument 1 to "mult_expr" has incompatible type "AddExprEval"; expected MultExpr[None]

This saved us from an ugly run time error:

$ python object-alg.py Traceback (most recent call last): File "object-alg.py", line 61, in <module> print(mult_expr(AddExprEval())) File "object-alg.py", line 56, in mult_expr return t.mult(add_expr(t), t.lit(2)) AttributeError: 'AddExprEval' object has no attribute 'mult'

Maybe a little mypy bug

I may have discovered a little bug whilst playing around with failure cases, I suspect either type variable unification is broken or doesn’t happen. The problem arises if we try to instantiate a subclass that should cause the metaclass’s type variables to fail to unify. This leads to being able to make a mistake that is not caught by mypy:

# The AddExprPrint super class will result in an incompatible type # (mixing strings with ints) class MultExprEvalBroken(AddExprPrint, MultExpr[ int ]): def mult( self , e1: int , e2: int ): return e1 * e2

$ mypy object-alg.py $ python object-alg.py Traceback (most recent call last): File "object-alg.py", line 65, in <module> print(mult_expr(MultExprEvalBroken())) File "object-alg.py", line 56, in mult_expr return t.mult(add_expr(t), t.lit(2)) File "object-alg.py", line 63, in mult return e1 * e2 TypeError: can't multiply sequence by non-int of type 'str'

I haven’t dug further into this.

Conclusion

The expression problem, solved in python! Who’d a thunk it?

Please enable JavaScript to view the comments powered by Disqus.