This blog post is part of the FastMail 2014 Advent Calendar.

The next post on December 2nd is Security - Confidentiality, Integrity and Availability.

Technical level: medium

Our email search system was originally written by Greg Banks, who has moved on to a role at Linked In, so I maintain it now. It's a custom extension to the Cyrus IMAPd mail server.

Fast search was a core required feature for our new web interface. My work account has over half a million emails in it, and even though our interface allows fast scroll, it's still impossible to find anything more than a few weeks old without either knowing exactly when it was sent, or having a powerful search facility.

Greg tried a few different engines, and settled on the Xapian project as the best fit for the one-database-per-user that we wanted.

We tried indexing new emails as they arrived, even directly to fast SSDs, and discovered that the load was just too high. Our servers were overloaded trying to index in time - because adding a single email causes a lot of updates.

Luckily, Xapian supports searching from multiple databases at once, so we came up with the idea of a tiered database structure.

New messages get indexed to tmpfs in a small database. A job runs every hour to see if tmpfs is getting too full (over 50% of the defined size), in which case it compacts immediately, otherwise we automatically compact during the quiet time of the day. Compacted databases are more efficient, but read only.

This allows us to index all email immediately, and return a message that arrived just a second ago in your search results complete with highlighted search terms, yet not overload the servers. It also means that search data can be stored on inexpensive disks, keeping the costs of our accounts down.

Technical level: extreme

Here's some very technical information about how the tiers are implemented, and an example of running the compaction.

We have 4 tiers at FastMail, though we don't actually use the 'meta' one (SSD) at the moment:

temp

meta

data

archive

The temp level is on tmpfs, purely in memory. Meta is on SSD, but we don't use that except during shutdown. Data is the main version, and we re-compact all the data level indexes once per week. Finally archive is never automatically updated, but we build it when users are moved or renamed, or can create it manually.

Both external locking (Xapian isn't always happy with multiple writers on one database) and the compaction logic are managed via a separate file called xapianactive. The xapianactive looks like this:

% cat /mnt/ssd30/sloti30t01/store23/conf/user/b/brong.xapianactive temp:264 archive:2 data:37

The first item in the active file is always the writable index - all the others are read-only.

These map to paths on disk according to the config file:

% grep search /etc/cyrus/imapd-sloti30t01.conf search_engine: xapian search_index_headers: no search_batchsize: 8192 defaultsearchtier: temp tempsearchpartition-default: /var/run/cyrus/search-sloti30t01 metasearchpartition-default: /mnt/ssd30/sloti30t01/store23/search datasearchpartition-default: /mnt/i30search/sloti30t01/store23/search archivesearchpartition-default: /mnt/i30search/sloti30t01/store23/search-archive

(the 'default tier' is to tell the system where to create a new search item)

So based on these paths, we find.

% du -s /var/run/cyrus/search-sloti30t01/b/user/brong/* /mnt/i30search/sloti30t01/store23/search/b/user/brong/* /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/* 3328 /var/run/cyrus/search-sloti30t01/b/user/brong/xapian.264 1520432 /mnt/i30search/sloti30t01/store23/search/b/user/brong/xapian.37 3365336 /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.2

I haven't compacted to archive for a while. Let's watch one of those. I'm selecting all the tiers, and compressing to a single tier. The process is as follows:

take an exclusive lock on the xapianactive file insert a new default tier database on the front (in this example it

will be temp:265) and unlock xapianactive again start compacting all the selected databases to a single database on

the given tier take an exclusive lock on the xapianactive file again if the xapianactive file has changed, discard all our work (we lock

against this, but it's a sanity check) and exit replace all the source databases for the compact with a reference to

the destination database and unlock xapianactive again delete all now-unused databases

Note that the xapianactive file is only locked for two VERY SHORT times. All the rest of the time, the compact runs in parallel, and both searching on the read-only source databases and indexing to the new temp database can continue.

This allows us to only ever have a single thread compacting to disk, so our search drives are mostly idle, and able to serve

customer search requests very quickly.

When holding an exclusive xapianactive lock, it's always safe to delete any databases which aren't mentioned in the file - at worst you will race against another task which is also deleting the same databases, so this system is self-cleaning after any failures.

Here goes:

% time sudo -u cyrus /usr/cyrus/bin/squatter -C /etc/cyrus/imapd-sloti30t01.conf -v -z archive -t temp,meta,data,archive -u brong compressing temp:264,archive:2,data:37 to archive:3 for user.brong (active temp:264,archive:2,data:37) adding new initial search location temp:265 compacting databases Compressing messages for brong done /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.3.NEW renaming tempdir into place finished compact of user.brong (active temp:265,archive:3) real 4m52.285s user 2m29.348s sys 0m13.948s % du -s /var/run/cyrus/search-sloti30t01/b/user/brong/* /mnt/i30search/sloti30t01/store23/search/b/user/brong/* /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/* 368 /var/run/cyrus/search-sloti30t01/b/user/brong/xapian.265 du: cannot access `/mnt/i30search/sloti30t01/store23/search/b/user/brong/*': No such file or directory 4614368 /mnt/i30search/sloti30t01/store23/search-archive/b/user/brong/xapian.3