Writing a Compiler in transit

code for this project available here





Winter vacations usually always involve a 10hr + transit at Bangkok Airport. My initial plan of action was to finish the second season of “The Sopranos” (which is an awesome series btw). However due to a set of circumstances, involving a weariness of watching mobsters eradicate each other, I decided to revisit haskell.

So I got another airport wifi token, and quickly started saving pages from Learn You a Haskell and the Write yourself a Scheme in 48 hrs. What better way to learn a language than write a small compiler, I reasoned.

A small step into Monad-istan

I had a very primitive understanding of parser combinators. Ironically I had understood them better when I used kern, a clojure parser combinator library with wonderful documentation.

What took me a while to understand were (and where I still get confused at times) were Monads, the do .. notation and the puzzling bind (»=) operator

I almost exclaimed “Eureka” when I discovered that (»=) is equivalent to Scala’s flatMap. ie for a list, its equivalent to applying map and flatten.

The best and most comprehensive resource I found on the matter was the Monad section in What I Wish I Knew When Learning Haskell by Stephen Diehl

So I could do things like

* Main > [ 1 , 2 , 3 ] >>= ( \ e -> [ e + 1 ]) [ 2 , 3 , 4 ] -- equivalent to * Main > do { e <- [ 1 , 2 , 3 ]; return ( e + 1 )} [ 2 , 3 , 4 ]

AST the first

Before I could write the parser I had to define a represetation for the AST that the parser would produce. By virtue of being a Lisp, pretty much everything was either a value, or a list of values. So I initially came up with :

data LispVal = LSymbol String | LList [ LispVal ] | LInteger Integer | LFloat Double | LString String | LBool Bool

Unlike the tutorial I wanted to support Floats and not just Integers.

Enter Parsec

The parser was pretty standard. I just needed to mostly follow the tutorial. I desugared some of the code from do notation to using raw (»=), just to clarify myself. Towards the end I came up with:

parseExpr :: Parser LispVal parseExpr = parseSymbol <|> parseString <|> parseQuoted <|> try ( parseInteger ) <|> parseFloat <|> do char '(' x <- parseList char ')' return x parseList :: Parser LispVal parseList = liftM LList $ sepBy parseExpr spaces parseString :: Parser LispVal parseString = do char '"' x <- many ( noneOf " \" " ) char '"' return $ LString x parseSymbol :: Parser LispVal parseSymbol = do first <- letter <|> specialChar rest <- many ( letter <|> specialChar <|> digit ) let symbol = first : rest return $ case symbol of "true" -> LBool True "false" -> LBool False _ -> LSymbol symbol parseInteger :: Parser LispVal parseInteger = do lookAhead $ many1 digit >> notFollowedBy ( oneOf "." ) ds <- many1 digit return $ ( LInteger . read ) ds parseFloat :: Parser LispVal parseFloat = liftM ( LFloat . read ) $ do d1 <- ( many digit ) c <- ( char '.' ) d2 <- ( many digit ) return $ d1 ++ [ c ] ++ d2

The Evaluation

This was my favorite part of the entire endeavour. It was also the point where I stopped following the tutorial.

The first task was thinking of the signature for eval. If I didn’t have any environment to worry about, eval would simply be LispVal -> LispVal.

type EnvVal = ( Environment , LispVal )

However, because I wanted to worry about Environments from the getgo, I decided that eval should be Environment -> LispVal -> LispVal. That is if no functions could modify the existing environment. Which doesn’t really work. I needed to return eval to return a new environment. Thus I settled on Environment -> LispVal -> (Environment, LispVal)

First order of business was setting up the primitives data types that eval to themselves

type EnvVal = ( Environment , LispVal ) eval :: Environment -> LispVal -> EnvVal eval e v @ ( LString _ ) = ( e , v ) eval e v @ ( LInteger _ ) = ( e , v ) eval e v @ ( LBool _ ) = ( e , v ) eval e v @ ( LFloat _ ) = ( e , v ) eval e v @ ( LError _ ) = ( e , v ) eval e ( LList [ LSymbol "quote" , v ]) = ( e , v ) eval e v @ ( LList [] ) = ( e , v )

And adding symbol insertion and lookup

eval env ( LSymbol name ) = ( env , lookupSymbol env name [] ) eval env ( LList [ LSymbol "def" , LSymbol name , v ]) = (( M . insert name ( snd ( eval env v )) env ), LSymbol name )

Function Application

However, at this point I decided I needed to define some primitive functions which were always available in the Environment. To do so, I would first need to extend LispVal to include a type for Primive Functions. I also added a constructor for errors.

data LispVal = LSymbol String | LList [ LispVal ] | LInteger Integer | LFloat Double | LString String | LBool Bool | LPrimitive ([ LispVal ] -> LispVal ) | LError String

I also needed a little bit of type inference as I wanted support for both floats and integers, and both had several common primitive functions. So behold, a very very poor man’s polymorphism:

eval env ( LList ( LSymbol func : args )) = ( env , apply env func $ map ( snd . ( eval env )) args ) apply :: Environment -> String -> [ LispVal ] -> LispVal apply env func args = case M . lookup func $ M . union listPrimitives env of Just ( LPrimitive f ) -> ( f args ) Nothing -> inferTypeAndApply func args inferTypeAndApply :: String -> [ LispVal ] -> LispVal inferTypeAndApply func args | isIntList args = lookupSymbol intPrimitives func args | isFloatList args = lookupSymbol floatPrimitives func args | isBoolList args = lookupSymbol boolPrimitives func args | otherwise = LError "symbol not defined"

Btw, this approach is probably quite brittle. No autopromotion of numeric types either.

A poor man’s prelude

Next up, I had to define a set of primitive functions. This was probably the easiest part of the entire endeavour. This was also around the time I started munching some delicious Phad Thai.

intPrimitives :: M . Map String LispVal intPrimitives = M . fromList [( "+" , LPrimitive $ intBinaryOp ( + )), ( "-" , LPrimitive $ intBinaryOp ( - )), ( "*" , LPrimitive $ intBinaryOp ( * )), ( "/" , LPrimitive $ intBinaryOp ( quot )), ( "mod" , LPrimitive $ intBinaryOp ( rem )), ( "pow" , LPrimitive $ intBinaryOp ( ^ ))] intBinaryOp :: ( Integer -> Integer -> Integer ) -> ([ LispVal ] -> LispVal ) intBinaryOp f = ( \ args -> case args of ( LInteger x ) : ( LInteger y ) :[] -> LInteger ( f x y ) _ -> LError "arity error" ) floatPrimitives :: Environment floatPrimitives = M . fromList [( "+" , LPrimitive $ floatBinaryOp ( + )), ( "-" , LPrimitive $ floatBinaryOp ( - )), ( "*" , LPrimitive $ floatBinaryOp ( * )), ( "/" , LPrimitive $ floatBinaryOp ( / )), ( "pow" , LPrimitive $ floatBinaryOp ( ** ))] floatBinaryOp :: ( Double -> Double -> Double ) -> ([ LispVal ] -> LispVal ) floatBinaryOp f = ( \ args -> case args of ( LFloat x ) : ( LFloat y ) :[] -> LFloat ( f x y ) _ -> LError "arity error" ) boolPrimitives :: Environment boolPrimitives = M . fromList [( "and" , LPrimitive $ boolBinaryOp ( B .&& )), ( "or" , LPrimitive $ boolBinaryOp ( B .|| )), --("xor", LPrimitive $ boolBinaryOp (B.xor)), ( "not" , LPrimitive $ ( \ args -> case args of ( LBool x ) :[] -> LBool $ B . not x _ -> LError "arity error" ))] boolBinaryOp :: ( Bool -> Bool -> Bool ) -> ([ LispVal ] -> LispVal ) boolBinaryOp f = ( \ args -> case args of ( LBool x ) : ( LBool y ) :[] -> LBool ( f x y ) _ -> LError "arity error" )

So hurray. A few hours down and I have a lispy calculator almost ready.

Getting impatient to see some action, I decided to implement the repl itself.

showVal :: LispVal -> String showVal ( LString c ) = " \" " ++ c ++ " \" " showVal ( LSymbol name ) = name showVal ( LInteger i ) = show i showVal ( LFloat f ) = show f showVal ( LBool True ) = "true" showVal ( LBool False ) = "false" showVal ( LList c ) = "(" ++ unwordsList c ++ ")" showVal ( LError e ) = "Error : " ++ e instance Show LispVal where show = showVal readAndEval :: Environment -> String -> EnvVal readAndEval env input = eval env $ readExpr input printEval :: EnvVal -> IO () printEval p = putStrLn ( show ( snd p )) flushStr :: String -> IO () flushStr str = putStr str >> hFlush stdout readPrompt :: String -> IO String readPrompt prompt = flushStr prompt >> getLine repl :: Environment -> IO () repl env = ( readPrompt "Hisp>>> " ) >>= ( \ inp -> if inp == "quit" then return () else let pair = readAndEval env inp in printEval pair >>= ( \ _ -> repl ( fst pair ))) main = repl M . empty

I had to revisit Monad-istan while writing this. I decided to avoid the do notation altogether when writing the repl function, leading to ugly but understable (atleast for me) code.

Hisp >>> ( + 2 3 ) 5 Hisp >>> ( + 2.0 5.5 ) 7.5 Hisp >>> ( and ( or true false ) true ) true Hisp >>> ( / 2 3 ) 0 Hisp >>> ( / 2.0 3.0 ) 0.6666666666666666 Hisp >>> ( pow 2 4 ) 16 Hisp >>> ( pow 4.0 0.5 ) 2.0

I also decided to add some primitives for list operations. This was a lisp after all. For lists I went with the classical approach for representing them as cons cells.

toLList :: [ LispVal ] -> LispVal toLList = foldr ( \ l acc -> LList [ LSymbol "cons" , l , acc ]) ( LList [] ) cons :: [ LispVal ] -> LispVal cons args = case args of x : y :[] -> LList [ LSymbol "cons" , x , y ] _ -> LError "arity error" first :: [ LispVal ] -> LispVal first (( LList [ LSymbol "cons" , f , r ]) :[] ) = f first _ = LError "illegal arguments" rest :: [ LispVal ] -> LispVal rest (( LList [ LSymbol "cons" , f , r ]) :[] ) = r rest _ = LError "illegal arguments" listPrimitives :: Environment listPrimitives = M . fromList [( "list" , LPrimitive ( toLList )), ( "cons" , LPrimitive ( cons )), ( "first" , LPrimitive ( first )), ( "rest" , LPrimitive ( rest )) ] Hisp >>> ( list 1 2 3 ) ( cons 1 ( cons 2 ( cons 3 () ))) Hisp >>> ( def l ( list 5 6 7 8 true false )) l Hisp >>> l ( cons 5 ( cons 6 ( cons 7 ( cons 8 ( cons true ( cons false () )))))) Hisp >>> ( first l ) 5 Hisp >>> ( rest l ) ( cons 6 ( cons 7 ( cons 8 ( cons true ( cons false () )))))

So far so good!

Of Closures

I decided I needed a new Data Constructor for Lambdas, as lambdas needed to keep a copy of the environment where they are declared, as well as their bindings and body. To enable recursion, lambda’s should keep their names if defined.

Therefore I changed LispVal by adding LLambda:

data LispVal = LSymbol String | LList [ LispVal ] | LInteger Integer | LFloat Double | LString String | LBool Bool | LPrimitive ([ LispVal ] -> LispVal ) | LLambda { name :: String , env :: Environment , bindings :: [ LispVal ], body :: LispVal } | LError String

I also needed to modify eval to give me Lambdas when seeing form like “(fn …)” or “(def .. (fn..”

-- for named functions eval e ( LList [ LSymbol "def" , LSymbol name , LList [ LSymbol "fn" , LList bindings , body @ ( LList _ )]]) = let newLambda = LLambda name e bindings body newEnv = M . insert name newLambda e in ( newEnv , newLambda ) -- for anonymous functions eval e ( LList [ LSymbol "fn" , LList bindings , body @ ( LList _ )]) = ( e , LLambda "" e bindings body )

For applying the Lambdas I implemented a helper function applyLambda. Lambdas could either be invoked via their symbols, or directly such as ((fn (bindings) body) args)

-- direct application eval e ( LList (( LList [ LSymbol "fn" , LList bindings , body ]) : args )) = let lambda = LLambda "" e bindings body in applyLambda lambda args -- added another case to apply for symbol application apply :: Environment -> String -> [ LispVal ] -> LispVal apply env func args = case M . lookup func $ M . union listPrimitives env of Just ( LPrimitive f ) -> ( f args ) Just l @ ( LLambda _ _ _ _ ) -> snd $ applyLambda l args Nothing -> inferTypeAndApply func args applyLambda :: LispVal -> [ LispVal ] -> EnvVal applyLambda l @ ( LLambda name lenv bindings body ) args | length args == length bindings = let closure = M . union ( M . fromList [( name , l )]) ( M . union lenv ( M . fromList ( zip ( map show bindings ) args ))) in eval closure body | otherwise = ( lenv , LError "arity error" )

There’s a hack here. My original ambitious plan was to allow clojure style destructuring. That’s why the bindings are of type [LispVal]. However around dawn as both sleep and my flight approached, I just hacked around by mapping show on the bindings.

Another thing to note is that I added the name of the lambda to the environment when applying it. This is to support recursion.

So, finally trying it out

Hisp >>> ( def sum ( fn ( a b ) ( + a b ))) < lambda : sum -> body : ( + a b ) > Hisp >>> ( sum 1 2 ) 3 Hisp >>> (( fn ( a ) ( + a 1 )) 1 ) 2

By this time, my flight was almost ready.

Conditionals and Equality

After landing in SG, I realised in order to make factorial (aka hello world) I needed conditionals. and equality. So first I implemented a very simple if:

eval e ( LList [ LSymbol "if" , cond , e1 , e2 ]) = let evalCond = ( eval e cond ) in case snd evalCond of LBool True -> eval ( fst evalCond ) e1 LBool False -> eval ( fst evalCond ) e2 _ -> ( e , LError "type error" )

For equality I simply decided to add an Eq instance to the LispVal type.

instance Eq LispVal where ( LPrimitive _ ) == _ = False ( LLambda _ _ _ _ ) == _ = False ( LError _ ) == _ = False ( LInteger i1 ) == ( LInteger i2 ) = ( i1 == i2 ) ( LInteger i1 ) == _ = False ( LFloat f1 ) == ( LFloat f2 ) = ( f1 == f2 ) ( LFloat f1 ) == _ = False ( LString s1 ) == ( LString s2 ) = ( s1 == s2 ) ( LString s1 ) == _ = False ( LBool b1 ) == ( LBool b2 ) = ( b1 == b2 ) ( LBool b1 ) == _ = False -- adding eval rule for = eval e ( LList [ LSymbol "=" , e1 , e2 ]) = let evalE1 = snd $ eval e e1 evalE2 = snd $ eval e e2 in ( e , LBool ( evalE1 == evalE2 ))

Finally testing it..

Hisp >>> ( def factorial ( fn ( a ) ( if ( = a 0 ) 1 ( * a ( factorial ( - a 1 )))))) < lambda : factorial -> body : ( if ( = a 0 ) 1 ( * a ( factorial ( - a 1 )))) > Hisp >>> ( factorial 5 ) 120

Eureka!

Afterthoughts

Well foremostly, this project helped me gain a very healthy appreciation for Haskell. Furthermore it helped de-mystefy some of the magic behind compilers and how programming languages are implemented.

Hisp is very very far from a complete compiler. There are several probelms with the way I designed the implementation:

Doing IO currently isn’t really possible. Probably using IORef to represent my Environment instead of a pure Data.Map would help here.

Several core functions like comparison (>,<, etc) aren’t available

Being a noob, the code is repitive is several areas and could probably be immensely refactored.

TESTS, TESTS, TESTS