This post is about the Hypothesis testing library. If you’re unfamiliar with it, check it out! Property tests are a fantastic addition to your testing suite. All examples use pytest.

Imagine we’ve written a bubblesort:

def bubblesort (l: List[int]) -> List[int]: # blah blah blah

What are some of the properties we could test? We could check that the input has the same length as the output:

from hypothesis import given from hypothesis.strategies import lists, integers @given (lists(integers())) def test_same_length (l): assert len(bubblesort(l)) == len(l)

Or we could test that the sort is idempotent:

@given (lists(integers())) def test_idempotent (l): assert bubblesort(bubblesort(l)) == bubblesort(l)

But neither test really gets at what we want: confirming the list is actually sorted. Fortunately, though, there’s a simple hypothesis test for that:

@given (lists(integers())) def test_sorts (l): assert bubblesort(l) == sorted(l)

We just need to check that the output of the function matches the output of a sorting function! In this case, sorted is an oracle, something we know is correct and can compare our solution to. This may look like cheating (“if we already have a sorting function, why would we be writing another?”), but this can be a surprisingly powerful testing strategy. Here are some common use-cases:

Refactoring with Oracles

One use case for an oracle is when you refactor an existing function: you want to clean up or optimize something you’re confident is correct and want to make sure your changes don’t break anything. Let’s take a toy example:

def is_prime (x: int) -> bool: if x < 2 : return False for i in range( 2 , x): if x % i == 0 : return False return True

A common optimization we can do is only check up to the square root of the number:

from math import ceil, sqrt def new_is_prime (x: int) -> bool: # blah blah blah for i in range( 2 , ceil(sqrt(x))): # sup

This seems like it works for simple test cases, but if we compare it to our original function, we find an error:

@given (integers()) def test_matches_oracle (x): assert is_prime(x) == new_is_prime(x)

This error comes from how range works: list(range(2, 2)) == [] , so our function fails for 4. We found this by using our original version as an oracle against the refactored version.

Special-Case Oracles

Sometimes we know our function works in specific cases but does not generalize. For example, the following function ostensibly takes a list and returns the highest possible product of three elements:

# We'll assume that all inputs will have more than 3 elements # So no need for a bounds check def buggy_max_triplet_prod (l: List[int]) -> int: a, b, c = sorted(l, reverse = True)[ 0 : 3 ] return a * b * c

This is correct when all of the elements are positive. However, it fails for [-4, -3, 1, 2] , returning -6 instead of 12. We can still use this as an oracle, though. Since we know it’s correct for certain lists, our final function must match its output for those lists.

@given (lists(integers(min_value = 0 ), min_size = 3 )) # only positive numbers def test_matches_on_positive (l): assert max_triplet_prod(l) == buggy_max_triplet_prod(l)

Over our restricted domain, the broken function is correct, and can be used as an oracle.

Inefficient Oracles

Sometimes there’s a solution to a problem that we know for sure works but is too slow/memory-intensive/etc for production code. We can use that solution as an oracle to test a more sophisticated function. If you wanted to write a function to insert into a sorted array, you could use the following as a reference oracle:

def ordered_insert_oracle (new_element: int, ordered_list: List[int]) -> List[int]: return sorted(ordered_list + [new_element]) @given (lists(integers), integers()) def test_ordered_insert (l, x): assert ordered_insert(l, x) == ordered_insert_oracle(l, x)

The oracle will be O(n*log(n)), while a well written ordered insert will be O(n). That might matter in production, but it won’t matter too much in testing, so we can benefit from an oracle here.

Reverse Oracles

Sometimes we can go the other way: instead of writing an oracle that will always get the right answer, we can write a function that takes an answer and finds a question for it. This gets a bit complicated, so let’s walk through an example. We’re writing our own counter.

from random import shuffle @given (lists(tuples(integers(), integers(min_value = 1 , max_value = 10 )), unique_by = lambda x: x[ 0 ])) def test_counter (l): test_list = [] for val, times in l: test_list += [val] * times shuffle(test_list) assert count(test_list) == {val:times for val, times in l}

Here, we’re taking a possible answer and generating a list that matches that. For example, if our test generates l = [(5, 2), (1, 1)] , that means we want our function to return {5: 2, 1: 1} . So we find a list that, if count is written correctly, would match that output. Our implementation is to first create the list [5, 5, 1] and then shuffle it.

This test ends up being more complicated than the others because it’s easier for the test itself to have bugs. Note the unique_by ; that’s because our test will bug if we generate l = [(1, 1), (1, 1)] : we’ll assert {1: 2} == {1: 1} .

Caveats

Most of these caveats apply to property-based tests (PBTs) in general.