lookup



module Main where

import Text.HTML.TagSoup

import Maybe



main = do

putStr "Enter a term to search for: "

term html let dict = parseDict $ parseTags html

putStrLn $ fromMaybe "No match found." $ lookup term dict





parseDict :: [Tag] -> [(String,String)]

parseDict = map parseItem

. sections (~== "<dt>")

. dropWhile (~/= "<div class=glosslist>")



parseItem xs = (innerText a, unwords $ words $ innerText b)

where (a,b) = break (~== "<dd>") (takeWhile (~/= "</dd>") xs)



parseDict

parseDict

parseItem

unwords $ words

I've just read issue 10 of The Monad.Reader . It's a great issue, including a tutorial on using the new GHCi debugger, and how to write an efficient Haskell interpreter in Haskell. The running example for the GHCi debugger is parsing the computer dictionary and extracting descriptions from keywords, using the TagSoup library. The article starts with an initial version of the extraction code, then fixes some mistakes using the debugger present in GHCi. The code was written to teach debugging, not as a demonstration of TagSoup. This post explains how I would have written the program.The original program is written in a low-level style. To search for a keyword, the program laboriously traverses through the file looking for the keyword, much like a modern imperative language might. But Haskell programmers can do better. We can separate the task: first parsing the keyword/description pairs into a list; then searching the list. Lazy evaluation will combine these separate operations to obtain something just as efficient as the original. By separating the concerns, we can express each at a higher-level, reducing the search function to a simple. It also gives us more flexibility for the future, allowing us to potentially reuse the parsing functions.I have fixed a number of other bugs in the code, and my solution is:Instead of searching for akeyword, I parsekeywords using. Thefunction first skips over the gunk at the top of the file, then finds each definition, and parses it. Thefunction spots where tags begin and end, and takes the text from inside. Theexpression is a neat trick for normalising the spacing within an arbitrary string.This revised program is shorter than the original, I find it easier to comprehend, and it provides more functionality with fewer bugs. The TagSoup library provides a robust base to work from, allowing concise expression of HTML/XML extraction programs.