HAML‘s lovely. As I’ve been working with Turbinado, I’ve been having some issues with HSP. HSP is an insanely impressive piece of software, but its error messages can be a bit unclear. So I started playing around a bit with HAML. Got me wondering “How easy would be to write a HAML parser in Haskell?”

So I tried. Here’s a first-pass, to-be-updated [and somewhat incomplete] HAML parser for Haskell. Not all of the features are implemented, but it’s a start. It generates HTML bits suitable for compilation by GHC.

Input

f = content page = #content .left.column %h2 Welcome to our site! %p = print_information %p = print_inline .right.column = render [abba = ding, ding = abba] dinger

Output

f = (stringToHtml "content") page = ((tag "div"![strAttr "id" "content"]) ((tag "div"![strAttr "class" "left column"]) ((tag "h2") (stringToHtml "Welcome to our site!") ) +++ ((tag "p") (stringToHtml print_information) ) +++ ((tag "p") (stringToHtml print_inline) ) +++ ((tag "div"![strAttr "class" "right column"]) (stringToHtml render) ) +++ ((tag "div"![strAttr "abba" "ding", strAttr "ding" "abba"]) (stringToHtml "dinger") ) ) )

le Code

module Main where import Text.ParserCombinators.Parsec import Text.ParserCombinators.Parsec.Language import Text.ParserCombinators.Parsec.Pos import qualified Text.ParserCombinators.Parsec.Token as T import Data.Char import Data.List import Data.Maybe import System.IO.Unsafe main = do s <- getContents case (parse mainParser "stdin" s) of Left err -> putStrLn "Error: " >> print err Right hs -> putStrLn hs -- Try to parse HAML, otherwise re-output raw lines mainParser = do whiteSpace ls <- many1 (hamlCode <|> tilEOL) return $ unlines ls -- -- * HAML lexer -- hamlLexer = T.makeTokenParser emptyDef whiteSpace= T.whiteSpace hamlLexer lexeme = T.lexeme hamlLexer symbol = T.symbol hamlLexer natural = T.natural hamlLexer parens = T.parens hamlLexer semi = T.semi hamlLexer squares = T.squares hamlLexer stringLiteral= T.stringLiteral hamlLexer identifier= T.identifier hamlLexer reserved = T.reserved hamlLexer reservedOp= T.reservedOp hamlLexer commaSep1 = T.commaSep1 hamlLexer -- -- * Main HAML parsers -- -- hamlCode is just many identifiers (e.g. 'func a b c' followed by '=' followed by a hamlBlock -- func a b c = %somehaml hamlCode = try ( do is <- many1 identifier symbol "=" currentPos <- getPosition x <- manyTill1 (lexeme $ hamlBlock) (notSameIndent currentPos) return $ (concat $ intersperse " " is) ++ " =

" ++ (concat $ (intersperse (indent currentPos ++ "+++

") $ filter (not . null) $ x)) ) -- A Block may start with some whitespace, then has a valid bit of data hamlBlock = do currentPos <- getPosition bs <- manyTill1 (pTag <|> pText) (notSameIndent currentPos) return $ intercalate (indent currentPos ++ "+++

") bs pTag = do currentPos <- getPosition try (do t <- lexeme tagParser ts <- (isInline currentPos >> char '/' >> return []) <|> (hamlBlock) return $ intercalate "

" $ filter (not . null) $ [ (indent currentPos) ++ "((" ++ (if (null ts) then "i" else "") ++ t ++ ")" , if null ts then [] else ts , (indent currentPos) ++ ")

"] ) pText = lexeme stringParser notSameIndent p = (eof >> return []) <|> (do innerPos <- getPosition case (sourceColumn p) == (sourceColumn innerPos) of True -> pzero False -> return [] ) -- -- * Various little parsers -- tagParser :: CharParser () String tagParser = do t <- optionMaybe tagParser' i <- optionMaybe idParser c <- optionMaybe (many1 classParser) a <- optionMaybe attributesParser if (isJust t || isJust i || isJust c || isJust a) then do return $ "tag \"" ++ (fromMaybe "div" t) ++ "\"" ++ (if not (isJust i || isJust c || isJust a) then "" else concat $ [ "![" , intercalate ", " $ filter (not . null) [ (maybe "" (\i' -> "strAttr \"id\" \"" ++ i' ++ "\"") i) , (maybe "" (\c' -> "strAttr \"class\" \"" ++ (intercalate " " c') ++ "\"") c) , (maybe "" (\kv -> intercalate ", " $ map (\(k,v) -> "strAttr \"" ++ k ++ "\" \"" ++ v ++ "\"") kv) a) ] , "]"] ) else pzero tagParser' :: CharParser () String tagParser' = do char '%' many1 termChar idParser :: CharParser () String idParser = do char '#' many1 termChar classParser :: CharParser () String classParser = do char '.' many1 termChar attributesParser :: CharParser () [(String, String)] attributesParser = squares (commaSep1 attributeParser) attributeParser :: CharParser () (String, String) attributeParser = do k <- identifier symbol "=" cs <- many1 identifier return (k, intercalate " " cs) stringParser :: CharParser () String stringParser = do currentPos <- getPosition modifier <- optionMaybe (char '=' <|> char '-') whiteSpace c <- alphaNum cs<- tilEOL case modifier of Just '-' -> return $ (indent currentPos) ++ "-" ++ c:cs Just '=' -> return $ (indent currentPos) ++ "(stringToHtml " ++ c:cs ++ ")" Nothing -> return $ (indent currentPos) ++ "(stringToHtml \"" ++ c:cs ++ "\")" -- -- * Utility functions -- isInline p = do p2 <- getPosition case (sourceLine p ) == (sourceLine p2) of True -> return [] False -> pzero isSameIndent p1 p2 = (sourceColumn p1) == (sourceColumn p2) tilEOL = manyTill1 (noneOf "

") eol eol = newline <|> (eof >> return '

') termChar = satisfy (\c -> (isAlphaNum c) || (c `elem` termPunctuation) ) termPunctuation = "-_" indent p = take (sourceColumn (p) - 1) (repeat ' ') manyTill1 p e = do ms <- manyTill p e case (null ms) of True -> pzero False -> return ms

Golly, but I wish that I’d cleaned up the code, but there it is in all of its raw, un-thought-through glory…