Here, the functional dependency is implementing a function from entity types to types of selected computed fields (internal to esqueleto ) and the result type (public in our exposed API).

The HasStats class is a wrapper, providing a value of the SomeStats helper data type for the appropriate entity. SomeStats packages up the functions we need in order to compute statistics.

First, it packages a function which turns our entity type into a collection of selected fields. The SqlSelect constraint ensures that we can actually turn these queries into SELECT statements using esqueleto .

SomeStats also packages up a function rep -> result which can turn esqueleto 's representation of this data into an actual result.

These two functions give us everything we need in order to select, filter, and sort by our selected statistics in our API.

Simple Searches

Simple searches were by far our biggest use case, so we decided to focus on that. For now, we are not going to worry about full-text search since we didn’t use it very much or in any crucial ways (even though, arguably, this is the real reason to use ElasticSearch in the first place). Thankfully, Postgres provides functionality for full-text search (what can’t it do?), which we will probably look into if we find we need to restore this functionality.

Our simple search functionality varied from entity to entity, so we created a new type class to capture the mapping from entity types to search functions, which are implemented as esqueleto queries:

The HasSearch type class

This looks complicated, but the complication just arises from the need to support searches which involve joins between multiple tables. The SearchData type family captures all of the entity information we need to pull in order to perform a query against a single row. The createTextQuery applies a search term to that structure to form a predicate which we can pass to the database.

Replacing ElasticSearch Query Strings

In order to replace ElasticSearch’s simple query string parser, we needed to implement our own small search DSL and parser. With Haskell’s parsec library, this was very simple. We designed a small language of search terms:

A type of query expressions

and a parser for these terms using parsec . For the most part, this expression language is self-explanatory, but the SpecificMatch constructor requires some explanation.

We would like to support query strings like firstName:John which matches a specific field. In order to turn this into a typed query which can be passed to esqueleto , we need to actually check that the firstName field has the right type — a string, in this case. Note that this is a type-changing operation — we go from untyped queries to typed queries.

That operation can fail, so we implement it using an error tracking monad. But we need to perform that operation against every SpecificMatch appearing in a Query . By deriving Traversable , we can simply use traverse to perform this operation globally, for free!

Notes on Dynamic Types

When we were using RethinkDB and ElasticSearch, everything was untyped, and we could pass names of fields directly to the database without type checking. If the types didn’t match, we would either not find out, or find out by catching an exception from the database driver. This was obviously not ideal from a debugging point of view, but it certainly made implementation easier.

In order to keep the same search functionality, we had to find a way to use untyped field names with the typed fields exposed by persistent and esqueleto . This problem also shows up when performing other operations such as filtering and sorting rows.

To solve this problem, we implemented a new type class in order to map our entity types to lists of acceptable field names:

The SelectRows type class

Here, selectRows is simply an association list of field names to column data.

What is SomeColumn ?

The SomeColumn data type

In order to produce a value of type SomeColumn entity , we must have a type typ of SQL expressions in mind (and instances for the required constraints on typ ), along with a function of type entity -> E.SqlExpr (E.Value typ) which can pull the appropriate field out of our entity type. In this way, SomeColumn captures both the type of the column, and how it’s read from the database.

The KnownPersistField constraint here is key. It provides a way to check the inferred type of a field against the type of filter provided by the user at runtime:

The KnownPersistField type class

PersistFieldProxy is a value level representative for a column type. It reflects all of the possible types of database columns we care about, at the value level. The KnownPersistField class simply says that we know the column type, and can provide a PersistFieldProxy .

Unlike the type-level representation of column types, we can case on a PersistFieldProxy to implement different behavior for different column types.

Pattern matching a PersistFieldProxy can also bring type information into scope. For example, we can check if a column type is a string, in which case we can bring a SqlString instance into scope. This allows us to use string queries in esqueleto :

The asSqlString function

Combining the SomeColumn data type and the KnownPersistField class, we have enough information to turn runtime strings into typed expressions, and to verify the type of any associated data provided by the user.

This was quite a lot of work in order to recover functionality which we got “for free” from RethinkDB using dynamic types, but I think it is worth it. Now we are able to turn any untyped query from our API into a typed query and we are able to inspect type information at runtime and act on it in various ways. Since we are able to catch type errors before they hit the database, our errors are better and we even catch some errors that the database driver would not report.

JSON Support

RethinkDB allowed us to store arbitrary JSON documents in columns, and perform quite expressive queries against fields in those documents. We were initially concerned that we might lose that functionality in moving to Postgres, and that we might have to store more data in columns.

However, we were pleasantly surprised to discover that the JSON support in Postgres was sufficiently advanced in order to implement all of the functionality we were using in RethinkDB.

We can pull nested data out of JSON documents and filter and sort based on those values. We can even join with an array of data stored in JSON format, and then aggregate those values in various ways.

We don’t store many JSON documents in the database, but for some cases, it is definitely preferable to storing flattened data over many columns. To support these cases, we write custom SQL functions which are then imported into Haskell as typed functions on esqueleto expressions.

One downside is that Postgres does not support JSON schemas, so it is possible to get into invalid states over time. We rely on Haskell to type-check our data on the way in to the database, but we have to be careful that migrations don’t invalidate existing data. We could look into using something like postgres-json-schema to solve this problem eventually.

Conclusion

The migration to Postgres was a success and it improved the performance, correctness, and maintainability of our code. There are still plenty of interesting ways we could improve things, and new Postgres features to explore.

It’s been an interesting challenge to recover all of the functionality and flexibility that we had with RethinkDB and ElasticSearch, but I hope I’ve shown that Postgres handled the job well, and that Haskell turned out to be crucial for such a large refactoring.

If you’re interested in using Haskell to solve real-world problems like these, then get in touch — we’re hiring at Lumi!