Leaky Interpreters

In Definitional interpreters for higher-order programming languages (here), John Reynolds considers the definition of a simple applicative programming language by means of an interpreter written in a similar language.

An important and frequently used method of defining a programming language is to give an interpreter for the language that is written in a second, hopefully better understood language. We will call these two languages the defined and defining languages, respectively. Reynolds, 1998.

Reynolds demonstrates how, without diligence, the semantics of the interpreter's implementation language may leak into the interpretation. Leaky interpreters are not equivalent to the defining semantics, and may exhibit incorrect side effects or return wrong values. Take Haskell, a language with non-strict semantics. The GHC compiler uses lazy evaluation as an implementation of these non-strict semantics. The GHCi Haskell interpreter honours these non-strict semantics in the same way.

A language called Drippy

Say hello to a simple functional programming language called Drippy Calculator, or Drippy for short. User defined functions are supported. Integer is the only type. The two constructs are + which adds two integers, and poison() which crashes a program when evaluated. Here's a simple Drippy program:

int add(int a, int b) { return(a + b); } int main() { int x,y,z; x = 2; y = 3; z = add(x, y); return(z); }

An interpretation of this program prints 5. Drippy is a strictly evaluated language and variables are immutable. The expressions in Drippy are constants, variables and function calls. Every function is a sequence of statements and ends with a return(..) . A few notes on Drippy evaluation (shared with Reynold's notes on his applicative language):

Statements are evaluated from top to bottom. After each statement evaluation the environment is modified, which binds variables to values. The evaluation of a constant always gives the same value, regardless of its environment. The value from evaluating variables and function calls depend on their environment's context. An applicative expression takes the form r 0 (r 1 ,..,r n ). The expressions r 0 ,r 1 ,..,r n are evaluted in the same environment to obtain values f, a 1 ,..,a n . If f is not a function of n arguments, then an error stop occurs. Otherwise, the function f is applied to the arguments a 1 ,..,a n , and if this application produces a result, then the result is the value of the application expression.

Drippy front end

Let's implement a front-end for Drippy in with a BNF grammar.

entrypoints Program ; separator Function "" ; terminator Statement ";" ; separator Expr "," ; separator Argument "," ; terminator VarDec ";" ; separator nonempty Ident "," ; Prog. Program ::= [Function] Main ; Fn. Function ::= Type Ident Params Body ; FnBody. Body ::= "{" [VarDec] [Statement] Return "}" ; Mn. Main ::= "int" "main" "(" ")" Body ; ExprConstant. Expr ::= Integer ; ExprId. Expr ::= Ident ; ExprFnCall. Expr ::= FnCall ; Prms. Params ::= "(" [Argument] ")" ; Arg. Argument ::= Type Ident ; Var. VarDec ::= Type [Ident]; TypeInt. Type ::= "int" ; StmtAssign. Statement ::= Assignment ; StmtLangCall. Statement ::= LanguagePoisonFn ; AssignId. Assignment ::= Ident "=" Expr ; FnCallNArity. FnCall ::= Ident "(" [Expr] ")" ; LangFnCall. FnCall ::= LanguageFn ; LangPoisonFnCall. FnCall ::= LanguagePoisonFn ; LangAdd. LanguageFn ::= Expr "+" Expr ; LangPoison. LanguagePoisonFn ::= "poison" "(" ")" ; Rtn. Return ::= "return" "(" Expr ")" ";" ;

Let's generate a lexer and parser from this, using the BNFC software.

$ bnfc -m --haskell Drippy.cf $ make

A leaky interpreter for Drippy

Given Drippy's strict semantics, the following program should crash once the third argument in the add(x, y, poison()) is evaluated in the asssignment to z .

int add(int a, int b, int c) { return(a + b); } int main() { int x,y,z; x = 2; y = 3; z = add(x, y, poison()); return(z); }

Here's a Haskell implementation of DrippyI, a Drippy interpreter.

module Main where import AbsDrippy import ParDrippy import ErrM import Data.Map (Map) import qualified Data.Map as Map import Data.Maybe import System.Environment (getArgs) main :: IO () main = do result IO Integer evalDrippy s = do let Ok prog = pProgram (myLexer s) interpret prog initVars :: Map Ident (Maybe Integer) -> VarDec -> Map Ident (Maybe Integer) initVars lclVars (Var _varType idents) = foldl (\cxt ident -> Map.insert ident Nothing cxt) lclVars idents interpret :: Program -> IO Integer interpret (Prog fns mainF) = evalMain mainF fns evalStmt :: Context -> Statement -> Context evalStmt cxt (StmtAssign (AssignId ident expr)) = cxt { localVars = Map.insert ident (Just (evalExpr cxt expr)) (localVars cxt) } evalStmt cxt (StmtLangCall LangPoison) = error "crash: poisonous statement." evalLangCall :: Context -> LanguageFn -> Integer evalLangCall cxt (LangAdd e1 e2) = evalExpr cxt e1 + evalExpr cxt e2 evalFnCall :: Context -> FnCall -> Integer evalFnCall cxt (FnCallNArity ident exprs) = evalUserFn cxt ident exprs evalFnCall cxt (LangFnCall langFun) = evalLangCall cxt langFun evalFnCall cxt (LangPoisonFnCall LangPoison) = error "crash: poisonous function argument." evalExpr :: Context -> Expr -> Integer evalExpr _cxt (ExprConstant constant) = constant evalExpr cxt (ExprId ident) = fromJust (fromJust (Map.lookup ident (localVars cxt))) evalExpr cxt (ExprFnCall fn) = evalFnCall cxt fn evalUserFn :: Context -> Ident -> [Expr] -> Integer evalUserFn cxt ident args = let Fn _ _ (Prms params) (FnBody vrs stmts (Rtn returnStmt)) = fromJust $ Map.lookup ident (funs cxt) xs = zipWith (\(Arg _type paramId) arg -> (paramId,Just (evalExpr cxt arg))) params args paramVals = Map.union (Map.fromList xs) (localVars cxt) lclVars = foldl initVars paramVals vrs localCxt = foldl evalStmt (cxt { localVars = lclVars }) stmts in evalExpr localCxt returnStmt evalMain :: Main -> [Function] -> IO Integer evalMain (Mn (FnBody vrs stmts (Rtn returnStmt))) fns = do let vars = foldl initVars Map.empty vrs lclFuns = Map.fromList $ map (\f@(Fn _ ident _ _) -> (ident,f)) fns cxt = Context vars lclFuns cxt' = foldl evalStmt cxt stmts return (evalExpr cxt' returnStmt)

Leaking into Drippy

Let's interpret our poisonous program to make sure it does indeed crash...

$ ./DrippyI my_application.drippy 5

Oh dear, the non-strict semantics of Haskell have leaked into DrippyI. The poisonous() argument in the add(..) call in the z assignment has not been evaluated. The leak is in evalUserFn , and the poisonous expression is the last element in the args argument. Although evalExpr is used in zipWith in the definition of xs , it is never applied. The function body of add provides the clue -- the third argument is never needed to evaluate the expression in the return(a+b) call.

int add(int a, int b, int c) { return(a + b); }

Haskell's call-by-need semantics have leaked in, the environment has not bound the c variable to any (normal form) value.

Fixing the leak

To recover the strict call-by-value semantics of Drippy, all function arguments are evaluated to normal form (requirement (2)) with deepseq . This is achieved in the definition of paramVals in a revised evalUserFn .

evalUserFn :: Context -> Ident -> [Expr] -> Integer evalUserFn cxt ident args = let Fn _ _ (Prms params) (FnBody vrs stmts (Rtn returnStmt)) = fromJust $ Map.lookup ident (funs cxt) xs = zipWith (\(Arg _type paramId) arg -> (paramId,Just (evalExpr cxt arg))) params args paramVals = map snd xs `deepseq` Map.union (Map.fromList xs) (localVars cxt) -- the fix lclVars = foldl initVars paramVals vrs localCxt = foldl evalStmt (cxt { localVars = lclVars }) stmts in if length params /= length args then error $ show ident ++ " expected " ++ show (length params) ++ " args" ++ " but " ++ show (length args) ++ " passed." else evalExpr localCxt returnStmt

This also implements requirement (3) of Drippy's evaluation semantics, which says that the number of function arguments must be the same as the number of expected arguments as per the user's function definition. If they are not same, an error must be thrown. This is checked in length params =/ length args if condition.

Let's try evaluating our poisonous program again:

$ ./DrippyI my_application.drippy Drippy: crash: poisonous function argument.

Great, the laziness leaked is sealed. Now let's test that requirement (3) is satisfied by instead trying to assign a value to z with add(x, y, poison(), 7) ...

$ ./DrippyI my_application.drippy Drippy: Ident "add" expected 3 args but 4 passed.

Hurrah, DrippyI leaks less!