At Clover Health, we are heavy users of PostgreSQL and Python 3 as well as strong believers in the value of testing what we build. Today, we’re happy to announce the open source release of two utility packages to improve testing in PostgreSQL/Python environments: pytest-pgsql and pgmock

(for the impatient reader, there are code samples a short scroll down)

Testing against database dependencies

Testing code against external dependencies is always a challenge, especially when those external dependencies are databases. There are a lot of common ways people handle external database dependencies and each has its own benefits and challenges.

Mocking

You can mock out your database calls at the execution layer with something like unittest.mock.

This can work pretty well in some cases and tends to be super fast. As a solution, it’s well suited to situations where the calls are not terribly complicated and you are primarily trying to test code around the database interactions.

As the complexity of the interactions increases, you need to put more and more work into making sure that your mocks are correctly mimicking the behavior of the database that you are mocking.

Additionally, if you get your SQL expressions wrong, mocked calls are not going to help you find the problems.

Using a lightweight alternative

Another option is to substitute a lighter, similar database for your target. This is usually aided by an intermediary ORM or SQL toolkit (like SQLAlchemy). This way you can use an in-memory sqlite database for testing and something else in production.

This has the advantages of actually testing your SQL expressions and letting you run your tests with relatively little additional setup. It’s also pretty quick in terms of spin-up and tear-down.

The problem is that SQL dialects can be pretty different and for everything but the smallest of cases your tests will not cover your target. For example, sqlite, MySQL, and PostgreSQL all subtly (or not so subtly) disagree on the meaning of NOW .

Eventually, the compatibility mismatches often end up being more work to maintain than justified.

Using a test database

There is also the option of standing up a full database and running your tests directly against your target.

This has the advantages of fully testing your SQL expressions using your target database engine.

The major disadvantages are that this usually involves a fairly large degree of test infrastructure and environment setup. It can also require fairly large spin-up and tear-down times for individual tests.

Introducing pytest-pgsql

We prefer the advantages provided by spinning up a full test database and, as engineers, feel there are technical options for mitigating many of the disadvantages.

As a starting point, we use the excellent testing.postgresql to handle most of the environment setup work. We prefer this over requiring engineers to maintain separate test databases and avoids many of the risks of failing to properly clean up at the end of a test run.

We use pytest as our test runner and have coalesced around pytest-style tests as our standard. This let us build a number of fixtures and helper methods around testing.postgresql for common use.

pytest-pgsql bundles these fixtures and helpers for common use.

Enough talk, let’s have some code

Here we are using the transacted_postgresql_db fixture, which provides a database object in the midst of an open transaction; when your test completes the transaction is rolled back in advance of the next test running. We do this because starting and rolling back transactions in PostgreSQL is a very fast operation resulting in minimal per-test overhead. This will only work with tests that do not depend on COMMIT ; for those cases, we also have the postgresql_db fixture which will spin up and tear down an entire database for a given test.

Here we are able to insert data and see the id column values generated by the server-side primary key sequence.

Freezing time

There are a number of useful helper functions in pytest-pgsql and I encourage you to read the docs but one of my personal favorites is pytest_pgsql.freeze_time.

Inspired by freezegun, we patch date lookups in SQL expressions en route to the database with predefined constant values.

Since we are patching at the query level, this will work for SQLAlchemy default values as well (unfortunately not for server_default= values):

Introducing pgmock

One of the challenges of unit testing is defining what a “unit” is.

For languages that support breaking things down into component functions, we can use a single function as the basis for a “unit” or code. In imperative languages we can use metrics like cyclomatic complexity (ex. McCabe complexity in flake8) or other heuristics to control the scale of functions, allowing teams to converge around what a unit of code to test is.

There are quite a few details around inputs, outputs, dependant systems (see above) and those could (and do) form the basis for many a blog.

However, there are languages that don’t make it quite so easy to break things into component parts, for example SQL.

SQL isn’t particularly well suited to composing so one doesn’t tend to write modular SQL. Even if one does, there is a lower bound to the complexity and one often ends up with subqueries nested a few layers deep.

pgmock allows selective mocking of SQL queries at various different scopes. This allows developers to isolate individual parts of a query for testing.

Enough talk, more code

Here we want to test the string concatenation and the result of the call to CHR and we'd rather not deal with the details of table_one and table_two . We mock out the input tables--which could have any number of other, extraneous columns--and we perform our query with inputs and outputs designed to test a single component of the problem.

If it’s not tested, it’s probably broken

When some of our teams and developers found testing sql transforms prohibitively hard, we could have accepted that and let those transforms exist untested but we believe in testing and building things is what what we do. Now, we hope some of the building blocks for our Python/PostgreSQL testing framework, can be of use to others.

PRs, questions, and feedback are always welcome.

We’re hiring

If you’re passionate about using tech to fix healthcare, or just get excited by PostgreSQL, Python, and complicated data domains, check out our careers page: Clover Health | Careers