I was recently tasked with replacing an existing Postgres based search with Elasticsearch to increase search speeds over millions of documents. The task required parity between both searches as well as additional future features such as supporting search-as-you-type. Having used Elasticsearch in Rails projects before I have come to like the quick feedback loop given by writing full integration tests from ETL to search result. This is my appraoch.

Before I delve into the setup, there is something I wanted to note. By default I have ES running on my machine, I have seen people create test clusters on demand on different ports, but for me this was not necessary. I just create a test index and use that, then get rid of it after the specs have run. Your mileage may vary with this approach.

Step 1. Some setup

First you’ll need the official elasticsearch gem if you don’t have it already. Set it up to connect to your ES instance, by default this should work:

Next you’ll want to have the ability to create new indexes, delete indexes and to flush indexes. For this I have a very basic and handy helper.

Step 2. Setup your index, mapping & load test data

First you’ll want to create a test index with your pre-defined mapping, you can read more about mapping in ES here. The mapping is important as it tells ES how to analyze and index your documents, which has significant effect on your search speed, index size, etc. I like to write my mapping definitions as hashes in ruby and pass them to the index on index creation. Here is an example of a very simple mapping:

You can now setup your index with your mapping in one go by running the following:

Great! Now you have a test index ready to take some data. If like me, you are creating your own ETL, you want to start by generating some data in your database for the ETL to read. If you aren’t testing an ETL you could simply load data directly into ES by posting to your index; either in batches or individually. I recommend reading more about this in the ealsticsearch-api README. Whichever approach you use, index your documents in ES now.

There is one more extremely important step before you’re ready to test, flush the index! You have to do this to ensure the data you loaded is there when you search it as ES asynchronously indexes data and you could end up with flaky specs.

Step 3. Test your search

At this stage you’ll have data loaded into your index and you’ll want to start covering all sorts of search scenarios, while iterating on your search query. Presumably you will have some class which accepts your search params and generates an ES query out of it. Writing queries as ruby hashes comes in very handy at this point as you can pass those along to your instance of Elasticsearch::Client and it will handle them nicely. Hashes are much nicer to work with than raw JSON, believe me.

This is where you should start seeing the benefit of setting up tests as you’ll be able to quickly iterate over changes to your query. You’ll have the ability to make adjustments to your mapping and any custom analyzers you might have. You’ll also have the chance to cover multiple edge cases. This is especially useful when your production index has millions of documents and testing changes against your real dataset just takes too long. Indexing millions of documents can take hours, ain’t nobody got time for that!

Step 4. Clean up after yourself

So you’ve tested your query, covered all of your edge cases and your happy paths. Now you just need to clean up after yourself by deleting the test index. That is simply done like so:

Running ES on circle ci

We run our projects on circleci at carwow, if you do as well and you intend on running your integration tests on there you’ll need to add ES to your circle.yml. Circle supports installing different versions of ES which you can do by replacing the version in the gist below with the one you’re using. We are using 5.3.0 at the time of writing, so we have added this to the circle.yml:

Final thoughts

For efficiency’s sake I tend to run the index creation only once before all of my specs within a context and delete my index after. This means that I ensure that my test data covers all of my test needs. This isn’t always possible, so you may need to run the setup in a before :each block, keep in mind that there will be overheads and your tests will be slower.

Here is an example of all of the above put together:

Questions? Comments? Give me a shout below and I’ll be happy to answer.