ePub and Kindle versions of Modern Perl: the Book and Liftoff: Launching Agile Teams & Projects are coming soon. We've been pulling all of the pieces of Pod::PseudoPod::DOM and Pod::PseudoPod::Book together so that any corpus written in PseudoPod can become a well-formatted PDF (of multiple page sizes), an ePub book, or attractive HTML.

Part of that process required an improvement to the indexer. (It's no secret that the Kindle index for the first edition of Modern Perl wasn't up to our standards.)

Part of that process means writing good tests for indexes.

I've long believed that the best way to test code is to write your test code as realistically as possible. This is a great way to exercise the code as real people will use it, and it gives you immediate feedback on what's tedious and awkward and easy to misuse.

Sometimes I still get the tests wrong though.

Consider: a PseudoPod document uses the X<> tags for indexing. A test of the indexer must parse a document containing several such tags, then test that the output or the internal tree which represents a parsed document contains appropriate index nodes.

In short, a document containing X<ice cream sandwich> should produce an entry for ice cream sandwiches in the index.

For my first testing approach, I tried to create the appropriate index nodes and their children to run through the index emitter. That experiment lasted ten minutes, with five of those minutes spent taking a break to rethink things.

Here's a secret about tests that people often don't realize: it doesn't really matter whether you test units as units or the system as a whole if you test everything you care about and your tests run fast enough.

That first approach failed because I cared too much about the details of how the indexer worked than about what it does. The right way to approach something like this is to figure out the characteristics of the code you want to exercise, then figure out the test data, then decide how to test for the results you want.

The basic tests must exercise basic indexed terms, the alphabetization and representation of those terms, subindexed terms, multiple instances of a single term, and the relationship of subindexed terms and top-level terms.

That sounds more complicated than it is, which led me to believe that there was a simple way to represent the data. Then, of course, I felt both a little silly and a lot relieved when I had the epiphany of using the document API to produce the index:

sub make_index_nodes { my $doc = qq|=head0 My Document



|; my $count = 0; for my $tag (@_) { $doc .= qq|=head1 Index Element $count



$tag



|; $count++; } my $parser = Pod::PseudoPod::DOM->new( formatter_role => 'Pod::PseudoPod::DOM::Role::XHTML', filename => 'dummy_file.html', ); my $dom = $parser->parse_string_document( $doc )->get_document; my $index = Pod::PseudoPod::DOM::Index->new; $index->add_entry( $_ ) for $dom->get_index_entries; return $index; }

That code looks more complex than it is, and could get simpler soon. (Next post: write only the code you need, refactor only when you need to refactor.) My tests use it like:

sub test_simple_index { my $index = make_index_nodes( 'X<some entry>' ); like $index->emit_index, qr!<h2>S</h2>!, 'index should contain top-level key for all entries'; }

... such that make_index_nodes() takes a list of tags, constructs a valid document, extracts an index, and lets the test functions do what they will with it. All the test functions have to know is how to send index tags to the test function and what to get out of the returned index object. (My chances of getting that wrong in subsequent tests are low.)

If you take any lessons from this, think of three things. First, if your tests are difficult to write, you might not understand what they need to do fully yet. What are you really testing and why? Why are you testing at the level you're testing? What do you expect to explore and how?

Second, use your API the real way as much as possible. Don't poke around in private elements or throw mock objects at the problem unless that's really the easiest way to test something tricky. Input goes in. Output comes out. Treat your code as a black box as much as possible.

Finally, reduce duplication in your test code as much as is practical. Simplicity is better than clever abstraction, but a function here and there or data-driven code can reduce the likelihood of bugs in a dramatic fashion. Every time I figure out a way to simplify my tests, they get easier to maintain and extend and my code quality improves.

That's the real goal, after all: making great code to solve real problems.