Following my previous post which suggested the use of Prism s for parsing and building, using a binary format example - I also want to show how the same idea can work nicely for parsing and building programming language syntax.

Simple AST example

data Expr = Lit Int | Add Expr Expr | Mul Expr Expr deriving ( Show , Eq , Generic ) 'Expr makePrisms '

Here's our Prism for parsing and building the above AST:

> expr # Mul ( Add ( Lit 1 ) ( Lit 2 )) ( Lit 3 ) expr) ()) ( "(1 + 2) * 3" expr :: Prism' String Expr = expr . -- convert string to tokens tokens . -- take the expression takeExpr -- and there should be no remaining tokens secondOnly [] takeExpr :: Prism' [ String ] ( Expr , [ String ]) ] (, []) = takeExpr "+" _Add $ -- Additions of infixOpLeftRecursion_Add "*" _Mul $ -- multiplications of infixOpLeftRecursion_Mul -- literals or tryMatch (asideFirst _Lit) . asideFirst _Show) $ (_ConsasideFirst _Show) . firstOnly "(" . -- expressions in parentheses _ConsfirstOnly . aside (_Cons . firstOnly ")" ) takeExpraside (_ConsfirstOnly

This uses the following combinators:

_Cons , _Show , and aside from Control.Lens

, , and from firstOnly , secondOnly , and asideFirst from the previous post

, , and from the previous post tokens , infixOpLeftRecursion , and tryMatch are defined in the appendix at the bottom

Observations

In the previous post, Prism s didn't match up to Python's Construct in encoding binary protocols, where Construct made good use of structural duck types (though this appears solvable with some effort). However, for programming language syntax Prism s seem very elegant imho.

Note how we harness optics' parametricity and composition. In the previous post we parsed ByteString s but here we parse String and we start by converting them to tokens (ie [String] ) and parse that.

Renegade prisms

Unlike the previous post's lawful Prism s, this post's parsing is lossy, so it breaks the Traversal laws:

> "1 + (2*3)" & expr %~ id expr "1 + 2 * 3"

If one desires lawful parsing Prisms, their AST representation has to represent white-space and redundant parentheses.

A Prism law that is kept is that if you parse what you built you do get it back:

import Test.QuickCheck.Arbitrary.ADT = (expr # e) ^? expr == Just e propParseBack e(expre)expr instance Arbitrary Expr where = genericArbitrary arbitrarygenericArbitrary = genericShrink shrinkgenericShrink > quickCheck propParseBack quickCheck propParseBack +++ OK , passed 100 tests . , passedtests

Caveat: meaninful parse errors

When parsing with this Prism fails, it offers no useful error-reporting. But do I believe that this is solvable and I'll address it in future posts.

Request for feedback

Do you think that some extra combinators used here ( asideFirst , firstOnly , etc) should belong in lens ?

, , etc) should belong in ? Or prehaps these combinators should belong in a separate package? How would you call it?

Any suggestions as for naming these combinators? Other code improvements?

Image credit: Does anyone know who is the artist for the opening image? (I found it on the internets)

Btw: Thanks to Eyal Lotem for reading drafts of this.

Discussion:

Appendix

AST parse-build prism combinators

-- Extend a base parsing prism with applications of an operator infixOpLeftRecursion :: Eq a => a -> -- The operator's text Prism' expr (expr, expr) -> -- The operator constructor's prism expr (expr, expr) Prism' [a] (expr, [a]) -> -- The base parsing prism [a] (expr, [a]) Prism' [a] (expr, [a]) [a] (expr, [a]) = infixOpLeftRecursion operatorText cons sub leftRecursion cons . firstOnly operatorText . sub) . retuple) (aside (_ConsfirstOnly operatorTextsub)retuple) sub -- Extend a base parsing prism with extensions to its right side leftRecursion :: Prism' whole cons -> whole cons Prism' (whole, state) (cons, state) -> (whole, state) (cons, state) Prism' state (whole, state) -> state (whole, state) Prism' state (whole, state) state (whole, state) = leftRecursion cons extend base fmap parseExtends . ( ^? base)) prism' build (parseExtendsbase)) where = build (x, state) maybe # (x, state)) (base(x, state)) . (extend # ) . (, state)) (x ^? cons) (build(extend(, state)) (xcons) = parseExtends x x ^? extend <&> _1 %~ (cons # ) & maybe x parseExtends extend_1(consx parseExtends -- Add an encoding for a sum-type constructor to an existing prism tryMatch :: Prism' whole cons -> -- The sum-type constructor prism whole cons Prism' src cons -> -- Parse the constructor contents src cons Prism' src whole -> -- Prism to encode the other options src whole Prism' src whole src whole = tryMatch cons parse fallback -> (x ^? parse <&> (cons # )) <|> x ^? fallback) prism' build (\x(xparse(cons))fallback) where = maybe (fallback # x) (parse # ) (x ^? cons) build x(fallbackx) (parse) (xcons) -- Transform a string into tokens tokens :: Iso' String [ String ] = tokens foldr addToken "" ) iso splitTokens (addToken where "" = x addToken x addToken [x] y | Char . generalCategory x == Char . OpenPunctuation = x : y generalCategory x : ys) addToken x (yys) | Char . generalCategory y == Char . ClosePunctuation = x <> (y : ys) generalCategory y(yys) = x <> " " <> y addToken x y = isOp ( `elem` [ Char . MathSymbol , Char . OtherPunctuation ]) . ]) Char . generalCategory generalCategory = ( `elem` "()[]{}" ) isParen "" = [] splitTokens[] : s : xs) | Char . isSpace s = [x] : splitTokens xs splitTokens (xxs)[x]splitTokens xs : xs) | Char . isSpace s = splitTokens xs splitTokens (sxs)splitTokens xs : xs) | isParen x = [x] : splitTokens xs splitTokens (xxs)isParen x[x]splitTokens xs : xs) = splitTokens (xxs) case splitTokens xs of splitTokens xs -> [[x]] [][[x]] : ys) : zs) | not (isParen y) && isOp x == isOp y -> (x : y : ys) : zs ((yys)zs)(isParen y)isOp xisOp y(xys)zs -> [x] : ys ys[x]ys