Simpler, Easier!

First, the untyped lambda calculus.

x e e λx.e

type Sym = String data Expr = Var Sym | App Expr Expr | Lam Sym Expr deriving (Eq, Read, Show)

App (Lam "x" $ Lam "y" $ Var "x") (Lam "z" $ Var "z")

Expr

(λx.e)a reduces to e[x:=a]

App (Lam v e) a

v

a

e

subst

whnf :: Expr -> Expr whnf ee = spine ee [] where spine (App f a) as = spine f (a:as) spine (Lam s e) (a:as) = spine (subst s a e) as spine f as = foldl App f as

subst

freeVars :: Expr -> [Sym] freeVars (Var s) = [s] freeVars (App f a) = freeVars f `union` freeVars a freeVars (Lam i e) = freeVars e \\ [i]

subst :: Sym -> Expr -> Expr -> Expr subst v x b = sub b where sub e@(Var i) = if i == v then x else e sub (App f a) = App (sub f) (sub a) sub (Lam i e) = if v == i then Lam i e else if i `elem` fvx then let i' = cloneSym e i e' = substVar i i' e in Lam i' (sub e') else Lam i (sub e) fvx = freeVars x cloneSym e i = loop i where loop i' = if i' `elem` vars then loop (i ++ "'") else i' vars = fvx ++ freeVars e substVar :: Sym -> Sym -> Expr -> Expr substVar s s' e = subst s (Var s') e

subst

v

b

x

Var

App

Lam

x

x

v

x

i'

cloneSym

x

e

e

substVar

Expr

alphaEq

alphaEq :: Expr -> Expr -> Bool alphaEq (Var v) (Var v') = v == v' alphaEq (App f a) (App f' a') = alphaEq f f' && alphaEq a a' alphaEq (Lam s e) (Lam s' e') = alphaEq e (substVar s' s e') alphaEq _ _ = False

nf :: Expr -> Expr nf ee = spine ee [] where spine (App f a) as = spine f (a:as) spine (Lam s e) [] = Lam s (nf e) spine (Lam s e) (a:as) = spine (subst s a e) as spine f as = app f as app f as = foldl App f (map nf as) betaEq :: Expr -> Expr -> Bool betaEq e1 e2 = alphaEq (nf e1) (nf e2)

whnf

nf

zero ≡ λs.λz.z one ≡ λs.λz.s z two ≡ λs.λz.s (s z) three ≡ λs.λz.s (s (s z)) plus ≡ λm.λn.λs.λz.m s (n s z)

[z,s,m,n] = map (Var . (:[])) "zsmn" app2 f x y = App (App f x) y zero = Lam "s" $ Lam "z" z one = Lam "s" $ Lam "z" $ App s z two = Lam "s" $ Lam "z" $ App s $ App s z three = Lam "s" $ Lam "z" $ App s $ App s $ App s z plus = Lam "m" $ Lam "n" $ Lam "s" $ Lam "z" $ app2 m s (app2 n s z)

betaEq (app2 plus one two) three

True

A detour, simply typed lambda calculus.

data Type = Base | Arrow Type Type deriving (Eq, Read, Show)

t→t B

x e e λx:t.e

data Expr = Var Sym | App Expr Expr | Lam Sym Type Expr deriving (Eq, Read, Show)

Type

Lam

newtype Env = Env [(Sym, Type)] deriving (Show) initalEnv :: Env initalEnv = Env [] extend :: Sym -> Type -> Env -> Env extend s t (Env r) = Env ((s, t) : r)

Either

TC

type ErrorMsg = String type TC a = Either ErrorMsg a

findVar :: Env -> Sym -> TC Type findVar (Env r) s = case lookup s r of Just t -> return t Nothing -> throwError $ "Cannot find variable " ++ s

throwError

Control.Monad.Error

tCheck :: Env -> Expr -> TC Type tCheck r (Var s) = findVar r s tCheck r (App f a) = do tf <- tCheck r f case tf of Arrow at rt -> do ta <- tCheck r a when (ta /= at) $ throwError "Bad function argument type" return rt _ -> throwError "Non-function in application" tCheck r (Lam s t e) = do let r' = extend s t r te <- tCheck r' e return $ Arrow t te

typeCheck :: Expr -> Type typeCheck e = case tCheck initalEnv e of Left msg -> error ("Type error:

" ++ msg) Right t -> t

Almost going down the wrong track, the polymorphic lambda calculus.

x e e λx:t.e Λα:k.e e[t]

t→t B α

k→k *

data Expr = Var Sym | App Expr Expr | Lam Sym Type Expr | TLam Sym Kind Expr | TApp Expr Type deriving (Eq, Read, Show) data Type = Arrow Type Type | Base | TVar Sym deriving (Eq, Read, Show) data Kind = KArrow Type Type | Star deriving (Eq, Read, Show)

Base

ω

Simply easy, the lambda cube.

TLam

Lam

TApp

App

TVar

Var

KArrow

Arrow

Arrow

Expr

data Expr = Var Sym | App Expr Expr | Lam Sym Type Expr | Pi Sym Type Type | Kind Kinds deriving (Eq, Read, Show) type Type = Expr data Kinds = Star | Box deriving (Eq, Read, Show)

Pi

Star

Box

freeVars :: Expr -> [Sym] freeVars (Var s) = [s] freeVars (App f a) = freeVars f `union` freeVars a freeVars (Lam i t e) = freeVars t `union` (freeVars e \\ [i]) freeVars (Pi i k t) = freeVars k `union` (freeVars t \\ [i]) freeVars (Kind _) = [] subst :: Sym -> Expr -> Expr -> Expr subst v x = sub where sub e@(Var i) = if i == v then x else e sub (App f a) = App (sub f) (sub a) sub (Lam i t e) = abstr Lam i t e sub (Pi i t e) = abstr Pi i t e sub (Kind k) = Kind k fvx = freeVars x cloneSym e i = loop i where loop i' = if i' `elem` vars then loop (i ++ "'") else i' vars = fvx ++ freeVars e abstr con i t e = if v == i then con i (sub t) e else if i `elem` fvx then let i' = cloneSym e i e' = substVar i i' e in con i' (sub t) (sub e') else con i (sub t) (sub e)

Lam

Pi

alphaEq

nf

nf :: Expr -> Expr nf ee = spine ee [] where spine (App f a) as = spine f (a:as) spine (Lam s t e) [] = Lam s (nf t) (nf e) spine (Lam s _ e) (a:as) = spine (subst s a e) as spine (Pi s k t) as = app (Pi s (nf k) (nf t)) as spine f as = app f as app f as = foldl App f (map nf as)

tCheck :: Env -> Expr -> TC Type tCheck r (Var s) = findVar r s

tCheck r (App f a) = do tf <- tCheckRed r f case tf of Pi x at rt -> do ta <- tCheck r a when (not (betaEq ta at)) $ throwError $ "Bad function argument type" return $ subst x a rt _ -> throwError $ "Non-function in application"

Pi

rt

rt

x

rt

x

betaEq

(==)

tCheck r (Lam s t e) = do tCheck r t let r' = extend s t r te <- tCheck r' e let lt = Pi s t te tCheck r lt return lt

Pi

tCheck

tCheck _ (Kind Star) = return $ Kind Box tCheck _ (Kind Box) = throwError "Found a Box"

Kind Star

Kind Box

Box

tCheck r (Pi x a b) = do s <- tCheckRed r a let r' = extend x a r t <- tCheckRed r' b when ((s, t) `notElem` allowedKinds) $ throwError "Bad abstraction" return t

(s, t)

a

b

allowedKinds :: [(Type, Type)] allowedKinds = [(Kind Star, Kind Star), (Kind Star, Kind Box), (Kind Box, Kind Star), (Kind Box, Kind Box)]

(*,*) values can depend on values. Just this gives the simply typed λ calculus. ([],[]) types can depend on types. ([],*) values can depend on type. Include all these three and you get F ω . (*,[]) types can depend on values. Include this one to get dependent types.

Examples

Identity id ≡ λa:*.λx:a.x Pairs Pair ≡ λa:*.λb:*.(c:*→((a→b→c)→c)) pair ≡ λa:*.λb:*.λx:a.λy:b.λc:*.λf:(a→b→c).f x y split ≡ λa:*.λb:*.λr:*.λf:(a→b→r).λp:(Pair a b).p r f fst ≡ λa:*.λb:*.λp:(Pair a b).split a b a (λx:a.λy:b.x) p snd ≡ λa:*.λb:*.λp:(Pair a b).split a b b (λx:a.λy:b.y) p

In a recent paper, Simply Easy! (An Implementation of a Dependently Typed Lambda Calculus) , the authors argue that type checking a dependently typed language is easy. I agree whole-heartedly, it doesn't have to be difficult at all. But I don't think the paper presents the easiest way to do it. So here is my take on how to write a simple dependent type checker. (There's nothing new here, and the authors of the paper are undoubtedly familiar with all of it.)I'll start by implementing the untyped lambda calculus. It's a very simple language with just three constructs: variables, applications, and lambda expressions, i.e.,For example,. In Haskell I'll use strings to represent variables names; it's simple and easy.The example above represented by. What do we want to do with thetype? Well, evaluating an expression seems like the thing we need. Now, there are many degrees of evaluation to choose from, Weak Head Normal Form, Head Normal Form, Normal Form, etc., etc. They differ in exactly where there might be reducible expression lingering. To evaluate lambda expression the most important step is β-reduction. A β-reduction step can be performed anywhere a function meets an argument, i.e., an application where the function is on λ form, a.k.a. a redex.Where the] notation means that all (free) occurrences of the variablein the expressionare replaced by. The example above has one redex, and doing a β step yields. The other kind of reduction we will make use of is α-substitution, which is simply renaming a bound variable, e.g.,can be changed to. Let's start with an easy evaluation strategy, normal order to WHNF. In WHNF we only need to make sure there the there's no redex along the "spine" of the expression, i.e., starting from the root and following the left branch in applications. Doing normal order reduction means that we doevaluate anything inside the argument of a β redex before doing the reduction. It's sometimes called lazy evaluation, but I prefer to use that term for an implementation strategy for normal order reduction. We implement normal order WHNF by walking down the spine collecting arguments (i.e., the right branch of a applications) until we reach a lambda or a variable. If we reach a variable we already have WHNF so we just reconstitute the applications again. If we hit a lambda we get to the crux of evaluation. We need to perform a β-reduction, i.e., if we havewe need to replace all (free) occurrences of the variableby the argumentinside the lambda body. That's what thefunction does.Thefunction is the only tricky part, so let's relax by first defining something easy, namely getting the free variables from an expression. The free variables are those variables that occur in an expression, but are not bound in it. We simply collect the variables in a set (using a list as a set here), removing anything bound.Back to substitution.Thefunction will replace all free occurrences ofbyinside, i.e.,]. Thecase is easy. If it's the variable we are replacing then replace else leave it alone. Thecase is also easy, just recurse in both branches. Thecase has three alternative. First, if the bound variable is the same as the one we are replacing then there can be no free occurrences inside it, so just return the lambda as is. Second, if the lambda bound variable is among the free variables inwe have a problem (see below). Third case, just recurse in the body. So, what about the case when the lambda bound variable occurs in? Well, if we just blindly continue substitution the variableinsidewill no longer refer to the same thing; it will refer to the variable bound in the lambda. That's no good. For example, take the expression, the β reduction gives(or similar), whereas doing the substitution blindly would give. Which is wrong! But it's easy to fix, just conjure up a variable,that will not clash with anything (does that). How do we come up with a good variable? Well, we don't want to pick one that is free in the expressionbecause that would lead to the same problem again. Nor do we want to pick a variable that is free inbecause that would accidentally bind something in. So we take the original identifier and tack on "'" until it fulfills our requirements. (OK, efficiency affectionados are allowed to complain a little here, but this isn't that bad actually.) Once we have a new variable we can do an α-conversion to rename the offending variable to something better. Thefunction is a utility when we want to replace one variable with another. Another useful thing to be able to do is to compare lambda expression for equality. We already have syntactic equality derived for, but it is also very useful to be able to compare expressions modulo α-conversions. That is, we'd liketo compare equal to. Let's call that functionVariables and applications just proceed along the structure of the expression. When we hit a lambda the variables might be different, so we do an α-conversion of the second expression to make them equal. As the final functions, we will do reduction to Normal Form (i.e., to a form where no redexes remain) and comparison of expressions via their normal forms.Computing the NF is similar to WHNF, but as we reconstruct expressions we make sure that all subexpression have NF as well. Note that bothandmay fail to terminate because not all expressions have a normal form. The canonical non-terminating example iswhich has one redex, and doing the reduction produces the same term again. But if an expression has a normal form then it is unique (the Church-Rosser theorem). Here are some sample lambda terms (named for convenience):Or, in HaskellAnd now we can check that addition works,will produceTo do type checking we need to introduce types. A very simple system is the simply typed lambda calculus. It has one (or more) base type (think of it as Bool or Int if you want an example) and function types. In Haskell terms:OrThe expressions themselves will have an explicit type on the bound variable in a lambda expression. So we now haveFor example,. The Haskell type for expressions isThe only difference is thein. All the functions we had for the untyped lambda calculus can be trivially extended to the simply typed one by simply carrying the type along. So finally, time for some type checking. The type checker will take an expression and return the type of the expression. The type checker will also need the types of all free variables in the expression to be able to do this. Otherwise, what type would it assign to, say, the expression? To represent the types of the free variables we use an environment which is simply a list of variables and their types.Type checking can go wrong; there can be type errors. To cater for this the type checker will be written in monadic style where the monad is simply an error (exception) monad. The error messages are strings, and the monad itself is thetype. Sois the type checking monad.We can now write variable lookup.It simply looks up the variable and returns the type. If not found it throws an error with(from). And then the type checker itself.For variables, just look up the type for it in the environment. For application, type check the function part and the argument part. The function should have function (arrow) type, and if it does the type of the application is the return type of the function. Finally, for a lambda expression we extend the environment with the bound variable. We then check the body, and the type of the lambda expression is a function type from the argument type to the type of the body. For convenience:Pretty easy sailing so far. The simply typed lambda calculus is a pain to use. Take something likein the untyped world. What type should we give it? Well that depends on how we intend to use it. Maybe, maybe, maybe... So we can no longer have one identity function; we need one for each type. What a bummer! It's as bad as C. BTW, all (type correct) expression in the simply typed lambda calculus have a normal form (Tait 1967).(Don't get me wrong, the polymorphic lambda calculus is a work of marvel.) So how can we fix the problem with one identity function for each type? We can add polymorphism! We can extend the expression language so that we also pass types around; we add type abstraction and type application.Whereis a type abstraction, i.e.,is a type variable which we can use in type expressions inside. To supply a type argument we have type application,. So the types we have would functions, base type, and type variables.And what isin the? Well, now types have gotten so complicated that it is possible to construct types that make no sense, so we need a "type system" for the types. We call them kinds.Defining all this is Haskell would be something likeBut wait, there's an awful lot of duplication here. The structures on the three levels have a lot of similarities. (Oh, and we don't really needanymore now when we have variables.) BTW, this system, called System F, is (a simplified version of) what GHC uses internally to represent Haskell code. It's a beautiful system, really. Oh, the identity function, well it would be. And using it:, assuminghas typeTo simplify and (as often happens when you simplify) generalize the expressions above we are going to squish them all into one expression data type. Soandwill join, as willandandand. But wait, there's nothing corresponding toin. We need to add something. We could just add it as it is, but we won't. TADA, enter dependent types. Instead of the boring function typewe will use a more exciting one,. What does it mean? It means that the variablecan occur in. If it doesn't then it's simply the same as the old fashioned function type. Ifdoes occur it means thatcan depend on theof the argument (). In Haskell:The new arrow type is called. We will also need more than one kind,and. It's pretty easy to extend the functions from the first part to handle this expression type. There's just a few more places to recurse. Here's the code again. Absolutly nothing subtle about it.To cut down on the code you could actually join theandconstructors since they are treated identically in many cases. I've left them separate for clarity. Thefunction extends in the natural way to the new type, so does, but here it is anyway.So, now for the meaty part, the type checking itself. The handling of the environment is just as before, so we'll just look at the different cases for the type checking.Just as before.This is almost as before, but the arrow type is callednow. The key thing here — and this is really where the fact that we are doing dependent types shows up — is the return type. For the simply typed lambda calculus it was just, but nowcan contain free occurences of the variable. Since we are returningthewould no longer be in scope, so we substitute the value of the argument for it. This is coolest part of the type checker. You've seen it. That's where it is. Since types can now be arbitrary expression we useto compare them instead ofThe lambda case is similar to before, but we return anow, so we need to include the variable name. Furthermore, to avoid nonsense likewe make sure that the type we want to return actually has a valid kind itself. (The first call tois to ensure the type we're putting into the environment is valid; I'm sure there's a more elegant way to do this, but I can't remember what it is right now.)Everything has a type, so what's the type of), well it's a [] () (excuse the ugly box, I can't find the HTML version of a box). And what's the type of? Well, you could keep going, but instead we'll stop right here. The idea is that the source language which we'll write our terms in will not allow the box to be written, so it should never occur.How do we check the type of the (dependent) function type? Well, we check the type of the thing to the left of the arrow, extend the environment, and then check the thing to the right. So now what shouldbe? Well,andshould be types (or maybe kinds). So their types should be kinds. This leads to the following definition:I.e., we allow (*,*), (*,[]), ([],*), and ([],[]). What does it all mean? Here's the beauty of the lambda cube. By varying what we allow we can change what system we type check.With all four combination allowed you get Calculus of Construction (CoC). If you always include (*,*), but make a choice of the other 3 you get 8 choices; these are the corners of the lambda cube. All of these system have been studied. BTW, all the systems in the lambda cube have the property that a well typed expression has a normal form. (Well, the proof of this is so complicated for some of these systems that some people kinda doubt it.)Here the syntaxmeans, where "_" is some new variable not used inMy fingers are numb from all these greek characters, so I'll continue with examples another time. And, of course, a parser and pretty printer.

Labels: Dependent types, Haskell, Lambda calculus