Expect tests are a technique I’ve written about before, but until recently, it’s been a little on the theoretical side. That’s because it’s been hard to take these ideas out for a spin due to lack of tooling outside of Jane Street’s walls.

That’s changed now, since Dune has gotten good support for using expect tests. Given that, I thought this would be a nice time to demonstrate how expect-tests can be useful in some ways you might not expect; in particular, as a way of doing exploratory programming.

Preliminaries

The basic idea of an expect test is simple: expect tests let you generate output that is then captured and included in the source file. To try this out, let’s first create a jbuild file for our little experiment.

(jbuild_version 1) (library ((name foo) (libraries (base stdio)) (inline_tests) (preprocess (pps (ppx_jane))) ))

Note that you’ll have to opam install base , stdio and ppx_jane for any of this to work. The inclusion of the (inline_tests) declaration is important here, as is the preprocessor line.

Now, we can write a simple .ml file that uses the expect test framework to generate some output.

open ! Base open ! Stdio let % expect_test "simple" = print_endline "Hello Expect World!"

We can then run this test and automatically capture the results by running dune (which is still, confusingly, called jbuilder at the command line.)

jbuilder runtest --auto-promote

You’ll now see the file change to the following as soon as the build is complete. (This is all more fun if your editor is set to auto-refresh.)

open ! Base open ! Stdio let % expect_test "simple" = print_endline "Hello Expect World!" ; [% expect {| Hello Expect World ! |}]

Smashing some HTML

Now let’s get to the exploratory programming part.

We’ll demonstrate a classic exploratory programming task: munging an HTML file to get some useful data. In particular, let’s say we want to find internal links on the opensource.janestreet.com site. We’re going to use lambdasoup, which is a great library for transforming HTML files.

After installing lambdasoup via opam, we need to update our jbuild file accordingly. We should also install and include support for a library called expect_test_helpers_kernel , which provides some useful tools for building expect tests.

(jbuild_version 1) (library ((name foo) (libraries (base stdio expect_test_helpers_kernel lambdasoup)) (inline_tests) (preprocess (pps (ppx_jane))) ))

Now, we can write a little function for extracting links from an HTML file, using lambdasoup.

open ! Base open ! Stdio open ! Expect_test_helpers_kernel let get_hrefs soup = Soup . select "a" soup |> Soup . to_list |> List . map ~ f :( Soup . R . attribute "href" )

We can test this out by writing an expect test against a little example.

let % expect_test "soup" = let example = {| < html >< body > < a href = "http://janestreet.com" > A link ! </ a > </ body ></ html > |} in let hrefs = get_hrefs ( Soup . parse example ) in print_s [% sexp ( hrefs : string list )]

Note that we use print_s from expect_test_helpers_kernel to format the s-expression, and the %sexp syntax extension to generate the s-expression to print. Again, if we run jbuilder again, the output will be inserted into the file for us.

let % expect_test "soup" = let example = {| < html >< body > < a href = "http://janestreet.com" > A link ! </ a > </ body ></ html > |} in let hrefs = get_hrefs ( Soup . parse example ) in print_s [% sexp ( hrefs : string list )]; [% expect {| ( http :// janestreet . com ) |}]

At this point, it might occur to us to wonder what would happen if we had an <a> element with no href. Well, we can just try that out.

let % expect_test "soup" = let example = {| < html >< body > < a href = "http://janestreet.com" > A link ! </ a > < a > A broken link ! </ a > </ body ></ html > |} in let hrefs = get_hrefs ( Soup . parse example ) in print_s [% sexp ( hrefs : string list )]; [% expect {| ( http :// janestreet . com ) |}]

Rerunning the test demonstrates that our code throws an exception in this case.

let % expect_test "soup" = let example = {| < html >< body > < a href = "http://janestreet.com" > A link ! </ a > < a > A broken link ! </ a > </ body ></ html > |} in let hrefs = get_hrefs ( Soup . parse example ) in print_s [% sexp ( hrefs : string list )]; [% expect {| DID NOT REACH THIS PROGRAM POINT |}]; [% expect {| (* expect_test_collector: This test expectation appears to contain a backtrace. This is strongly discouraged as backtraces are fragile. Please change this test to not include a backtrace. *) ( "A top-level expression in [let%expect] raised -- consider using [show_raise]" ( Failure "Soup.R.attribute: None" ) ( backtrace ( "Raised at file \" pervasives.ml \" , line 32, characters 17-33" "Called from file \" src/list.ml \" , line 326, characters 13-17" "Called from file \" test.ml \" , line 17, characters 14-44" "Called from file \" src/expect_test_helpers_kernel.ml \" , line 475, characters 6-11" ))) |}]

We can fix this easily enough by changing the selector we use to only look for <a> nodes with an href, as follows.

let get_hrefs soup = Soup . select "a[href]" soup |> Soup . to_list |> List . map ~ f :( Soup . R . attribute "href" )

And now, rerunning jbuilder will show that we get reasonable output once again.

let % expect_test "soup" = let example = {| < html >< body > < a href = "http://janestreet.com" > A link ! </ a > < a > A broken link ! </ a > </ body ></ html > |} in let hrefs = get_hrefs ( Soup . parse example ) in print_s [% sexp ( hrefs : string list )]; [% expect {| ( http :// janestreet . com ) |}]

Adding some real data

What if we want to apply this to some real data? Let’s grab the current contents of opensource.janestreet.com from the web and save it to a file called opensource.html . If we want our test to be able to read from this file, we need to add it as an explicit dependency, so we’ll adjust the jbuild file accordingly.

(jbuild_version 1) (library ((name foo) (libraries (base stdio expect_test_helpers_kernel lambdasoup)) (inline_tests ((deps (opensource.html)))) (preprocess (pps (ppx_jane))) ))

Now, we can add a new test, to see what our function does on opensource.html .

let % expect_test "opensource" = let soup = In_channel . read_all "opensource.html" |> Soup . parse in let hrefs = get_hrefs soup in print_s [% sexp ( hrefs : string list )]

Again, if we run the test, the file will be updated to include the output.

let % expect_test "opensource" = let soup = In_channel . read_all "opensource.html" |> Soup . parse in let hrefs = get_hrefs soup in print_s [% sexp ( hrefs : string list )]; [% expect {| ( https :// www . janestreet . com / ad - cookie - policy https :// opensource . janestreet . com / https :// github . com / janestreet https :// ocaml . janestreet . com / ocaml - core / latest / doc / index . html https :// github . com / janestreet https :// github . com / ocaml / dune https :// opensource . janestreet . com / base https :// opensource . janestreet . com / core https :// opensource . janestreet . com / async https :// opensource . janestreet . com / incremental https :// www . janestreet . com / technology / https :// blog . janestreet . com / https :// opensource . janestreet . com / contribute https :// janestreet . com /) |}]

Now, we only wanted to extract the links that were actually on opensource.janestreet.com, and we got a bunch of other irrelevant links. To fix this, we need to analyze the URIs, so we’ll install the uri package from opam and add it to our jbuild, at which point we can change the code as follows.

let % expect_test "opensource" = let soup = In_channel . read_all "opensource.html" |> Soup . parse in let internal_links = get_hrefs soup |> List . filter ~ f :( fun uri -> let uri = Uri . of_string uri in match Uri . host uri with | None -> false | Some host -> String . (=) host "opensource.janestreet.com" ) in print_s [% sexp ( internal_links : string list )]; [% expect {| ( https :// opensource . janestreet . com / https :// opensource . janestreet . com / base https :// opensource . janestreet . com / core https :// opensource . janestreet . com / async https :// opensource . janestreet . com / incremental https :// opensource . janestreet . com / contribute ) |}]

Which gives us what we were looking for.

What’s nice about this approach is that we’ve been able to do this all in a way that’s both lightweight and repeatable. We can take the code we’ve written, commit it to the repo we’re working on, and anyone else can try to extend our examples. What’s more, once the logic we want is finished, it might make sense to leave in these little experiments as regression tests, which will help make sure that we don’t break things as we start refactoring and reorganizing the code later.