Another point for Haskell

In Round 1 I described the task: find the number of “Titles” in an HTML file. I started with the Python implementation, and wrote this test:

def test_finds_the_correct_number_of_titles(): assert title_count() == 59

Very simple: I had already downloaded the web page and so this function had to do just two things: (1) read in the file, and then (2) parse the html & count the Title sections.

Not so simple in Haskell

Anyone who’s done Haskell might be able to guess at the problem I was about to have: The very same test, written in Haskell, wouldn’t compile!

it "finds the correct number of titles" $ titleCount `shouldBe` 59

There was no simple way to write that function to do both, read the file and parse it, finally returning a simple integer. I realized that the blocker was Haskell’s famous separation between pure and impure code. Using I/O (reading a file) is an impure task, and so anything combined with it becomes impure. My function’s return value would be locked in an I/O wrapper of sorts.

I got frustrated and thought about dumping Haskell. “Just another example of how it’s too difficult for practical work,” I thought. But then I wondered how hard it would be to read in the file as a fixture in the test, and then call the function? I’d just need to pass the html as a parameter. And yep, this worked:

it "finds the correct number of titles" $ do html <- readFile "nrs.html" titleCount html `shouldBe` 59

As I refactored the code to pass this test, I realized that this is much better: Doing I/O and algorithmic work should always be separate. I had been a little sloppy or lazy in setting up my first task. The app with the Haskell-inspired change will be more reliable and easier to test, regardless which language it ends up being written in.

See also: