ElasticSearch version 0.12 is out today along with some nice new features.

However, the thing I'm most excited about is that ElasticSearch.pm v 0.26 is also out and has support for bulk indexing and pluggable backends, both of which add a significant performance boost.

Pluggable backends

I've factored out the parts which actually talk to the ElasticSearch server into the ElasticSearch::Transport module, which acts as a base class for ElasticSearch::Transport::HTTP (which uses LWP), ::HTTPLite (which uses, not surprisingly, HTTP::Lite) and ::Thrift, which uses the Thrift protocol

I expected Thrift to be the big winner, but it turns out that the generated code is dog-slow. However, HTTP::Lite is about 20% faster than LWP:

httplite : 63 seconds, 951 tps http : 79 seconds, 759 tps thrift : 690 seconds, 87 tps

Bulk indexing

Since version 0.11, ElasticSearch has had a bulk operation, which can take a stream of index , create and delete statements in a single request.

For instance, you could do:

$es->bulk( { index => { index => 'foo', type=>'bar', id=>1, data => { foo => 'bar' } }}, { create => { index => 'foo', type=>'bar', id=>2, data => { foo => 'bar' } }}, { delete => { index => 'foo', type=>'bar', id=>1 }} );

The number of actions you can pass in depends on how much memory you have, both on the client and the server, and how big your documents are.

I tried tranches of 1,000, 5,000 and 10,000 documents at a time, the results were very similar.

All tranches and all transports averaged about 7.5 seconds or 8,000 transactions per second! These are small documents, so I would be surprised to achieve this rate in the real world, but a 10x improvement is phenomenal.

(These benchmarks were run on my laptop with a single ElasticSearch node, over 59,950 documents ( { text => $string} ) whose string value averaged 310 characters in length and consisted of real world text, not randomly generated gibberish. )

Example script

(This is now included in the examples directory of ElasticSearch.pm)

Finally, here is a simple example script which downloads from github all of the issues open against ElasticSearch, indexes them, and provides a simple command line interface to searching for them: