A year with Notmuch mail

Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net.

For a little longer than a year now, I have been using Notmuch as my primary means of reading email. Though the experience has not been without some annoyances, I feel that it has been a net improvement and expect to keep using Notmuch for quite some time. Undoubtedly it is not a tool suitable for everyone though; managing email is a task where individual needs and preferences play an important role and different tools will each fill a different niche. So before I discuss what I have learned about Notmuch, it will be necessary to describe my own peculiar preferences and practices.

Notmuch context

I can identify three driving forces in my attitude to email. First there is a desire to be in control. I want my email to be stored primarily on my hardware, preferably in my home. For this reason, various popular hosted email services are of little interest to me. Second is the difficulty I have with throwing things away; I'm sure I'll never want to look at 99.999% of my email a second time, but I don't know which 0.001% will interest me again, and I don't want to risk deleting something I may someday want. Finally, I am somewhat obsessive about categorization. "A place for everything, and everything in its place" is a goal, but rarely a reality, for me. Email is one area where that goal seems achievable, so I have a broad collection of categories for filing incoming mail, possibly more than I need.

My most recent experience before committing to Notmuch was to use Claws Mail as an IMAP client that accessed email using the Dovecot IMAP server running on a machine in my home. This was sufficient for many years, mostly because I work at home with good network connectivity between client and server. On those rare occasions when I traveled to the other side of the world, latency worsened and the upstream bandwidth over my home ADSL connection was not sufficient to provide acceptable service; it was bearable, but that is all. Using procmail to filter my email into different folders met most of my obsessive need to categorize, but when an email sensibly belonged in multiple categories, there wasn't really any good solution.

I think the frustration that finally pushed me to commit to the rather large step of transitioning to Notmuch was the difficulty of searching. Claws doesn't have a very good search interface and Dovecot (at least in the default configuration) doesn't have very good performance. I knew Notmuch could do better, so I made the break.

A close second in the frustration stakes was that I had to use a different editor for composing email than I used for writing code. Prior to choosing Claws, I used the Emacs View Mail mode; it had been difficult giving up that seamless integration between code editor and email editor. Notmuch offered a chance to recover that uniformity.

Notmuch of a mail system

Notmuch has been introduced and reviewed (twice) in these pages previously so only a brief recap will be provided here. Notmuch describes itself as "not much of an email program"; it doesn't aim to provide a friendly user interface, just a back-end engine that indexes and retrieves email messages. Most of the user-interface tasks are left to a separate tool such as an Emacs mode that I use. In this vein of self-deprecation, the web site states that even "for what it does do, that work is provided by an external library, Xapian". This is a little unfair as Notmuch does contain other functionality. It decodes MIME messages in order to index the decoded text with the help of libgmime. It manages a configuration file with the help of g_key_file from GLib. And it will decrypt encrypted messages, using GnuPG. It even has some built-in functionality for managing tags and tracking message threads.

The genius of Notmuch is really the way it combines all these various libraries together into a useful whole that can then be used to build a user interface. That interface can run the Notmuch tool separately, or can link with the libnotmuch library to perform searches and access email messages.

Notmuch need for initial tagging

Notmuch provides powerful functionality but, quite appropriately, does not impose any particular policy for how this functionality should be used. It quickly became clear to me that there is a tension between using tags and using saved searches as the primary means of categorizing incoming email. Tags are simple words, such as "unread", "inbox", "spam", or "list-lkml", that can be associated with individual messages. Saved searches were not natively supported by Notmuch before version 0.23, which was released in early October (and which calls them "named queries"), but are easily supported by user-interface implementations.

Using tags as a primary categorization is the idea behind the "Approaches to initial tagging" section of the Notmuch documentation. This page provides some examples of how a "hook" can be run when new mail arrives to test each message against a number of rules and then to possibly add a selection of tags to that message. The user interface can then be asked to display all messages with a particular tag.

I chose not to pursue this approach, primarily because I want to be able to change the rules and have the new rule apply equally to old emails, which doesn't work when rules are applied at the moment of mail delivery. The alternative is to use fairly complex saved searches. This ran into a problem when I wanted one saved search to make reference to another, as neither the Emacs interface nor the Notmuch backend had a syntax including one saved search in another search. For example, I have one saved search to identify email from businesses (that I am happy to receive email from) whose mail otherwise looks like spam. So my "spam" saved search is something like:

tag:spam and not saved:commercial

The new "named queries" support should make this easy to handle but, until I upgrade my Notmuch installation, I have a wrapper script around the "notmuch" tool that performs simple substitutions to interpolate saved searches as required.

It also causes a minor problem in that I have several saved searches that are intermediaries that I'm not directly interested in, but which still appear in my list of saved searches. Those tend to clutter up the main screen in the Emacs interface.

Unfortunately, the indexing that Notmuch performs is not quite complete, so some information is not directly accessible to saved searches, resulting in the need for some limited handling at mail delivery time. Notmuch does not index all headers; two missed headers that are of interest to me are " X-Bogosity " and " References ".

I use bogofilter to detect spam, which adds the " X-Bogosity " header to messages to indicate their status. Further, when someone replies to an email that I sent out, I like that reply to be treated differently from regular email, and particularly to get a free pass through my spam filter. I can detect replies by simple pattern matching on the References or In-reply-to headers. While Notmuch does include these values in the index so that threads can be tracked, it does not index them in a way that allows pattern matching, so there is no way for Notmuch to directly find replies to my emails.

To address this need, I have a small procmail filter that runs bogofilter and then files email in one of the folders "spam", "reply", or "recv" depending on which headers are found. Notmuch supports " folder: " queries for searches, so that my saved search can now differentiate based on these headers that Notmuch cannot directly see.

I find that tags still are useful, but that use is largely orthogonal to classification based on content. When new mail arrives, it is automatically tagged as both " unread " and " inbox ". When I read a message, the " unread " tag is cleared; when I archive it, the " inbox " tag is cleared. I would like an extra tag, " new ", which would be cleared as soon as I see the subject in a list of new email, but the Emacs interface I use doesn't yet support that.

There are other uses for tags, such as marking emails that I should review when submitting my tax return or that need to be reported to bogofilter because it guessed wrongly about their spam status, but they all reflect decisions that I consciously make rather than decisions that are made automatically.

Notmuch remote access

Remote access via IMAP can be slow, but that is still faster than not having remote access at all, which is the default situation when the mail store only provides local access. I have two mechanisms for remote access that work well enough for me.

When I am in my home city, I only need occasional remote access; this is easily achieved by logging in remotely with SSH and running "emacsclient -t" in a terminal window. This connects to my running Emacs instance and gives me a new window through which I can access Notmuch almost as easily as on my desktop. A few things don't work transparently, viewing PDF files and other non-text attachments in particular, but as this is only an occasional need, lack of access to non-text content is not a real barrier. Here we see again the genius of Notmuch in making use of existing technology rather than inventing everything itself. Notmuch isn't contributing at all to this remote access but, since it supports Emacs as a user-interface, all the power of Emacs is immediately available.

For times when I am away from home and need more regular and complete remote access, there is muchsync, a tool that synchronizes two Notmuch mail stores. All email messages are stored one per file, so synchronizing those simply requires determining which files have been added or removed since the last synchronization and copying or deleting them. Tags are stored in the Xapian database, so a little more effort is required there but, again, muchsync just looks to see what has changed since the last sync and copies the relevant tags. I don't know yet if muchsync will synchronize the named queries and other configuration that can be stored in the database in the latest Notmuch release. Confirming that is a major prerequisite to upgrading.

Before discovering muchsync, I had used rsync to synchronize mail stores; I was happy to find that muchsync was significantly faster. While rsync is efficient when there are small changes to large files, it is not so efficient when there are small changes to a large list of files. The first step in an rsync transaction is to exchange a complete list of file names, which can be slow when there are tens of thousands of them. Muchsync doesn't waste time on this step as it remembers what is known to be on the replica, so it can deduce changes locally.

With muchsync, reading email on my notebook is much like reading email on my desktop. Unfortunately, I cannot yet read email on my phone, though I don't personally find that to be a big cost. There is a web interface for Notmuch written in Haskell, but I have not put enough effort into that to get it working so I don't know if it would be a usable interface for me.

When Notmuch mail is too much

As noted above, I don't like deleting email because I'm never quite sure what I want to keep. Notmuch allows me to simply clear the inbox flag; thereafter I'll never see the message again unless I explicitly search for older messages, as my saved searches all include that tag. As a result, I haven't deleted email since I started using Notmuch and have over 600,000 messages at present (528,000 in the last year, over half of that total from the linux-kernel mailing list). The mail store and associated index consume nearly ten gigabytes. I'm hoping that Moore's law will save me from ever having to delete any of this. This large store allows me to see if very large amounts of email is too much or if, as the program claims, "that's not much mail".

As far as I can tell, the total number of messages has no effect on operations that don't try to access all of those messages, so extracting a message by message ID, listing messages with a particular tag, or adding or clearing a tag, for example, are just as fast in a mail store with 100,000 messages as in one with 100 messages. The times when a lot of mail can seem to be too much is when a search matches thousands of messages or more. There are two particular times when I find this noticeable.

As you might imagine, given my need for categorization, I have quite a few saved searches. The Emacs front end for Notmuch has a "hello" page that helpfully lists all the saved searches together with the number of matching messages. Some of these searches are quite complex and, while the complexity doesn't seem to be a particular problem, the number of matches does. Counting the 217,952 linux-kernel messages still marked as in my inbox takes four to eight seconds, depending on the hardware. It only takes a few saved searches that take more than a couple of seconds for there to be an irritating lag when Emacs wants to update the "hello" page. Similarly, generating the list of matches for a large search can take a couple of seconds just to start producing the list, and much longer to create the whole list.

None of these delays need to be a problem. Having precise up-to-the-moment counts for each search is not really necessary, so updating those counts asynchronously would be perfectly satisfactory and rarely noticeable. Unfortunately, the Notmuch Emacs mode updates them all synchronously and (in the default configuration) does so every time the "hello" window is displayed. This delay can become tiresome.

When displaying the summary lines for a saved search, the Emacs interface is not synchronous, so there is no need to wait for the full list to be generated, but one still needs to wait the second or two for the first few entries in a large list to be displayed. If the condition " date:-1month.. " is added to a search, only messages that arrived in the last month will be displayed, but they will normally be displayed without any noticeable delay as there are far fewer of them. The user interface could then collect earlier months asynchronously so they can be displayed quickly if the user scrolls down. The Emacs interface doesn't yet support this approach.

Notmuch locking

As a general rule, those Notmuch operations that have the potential to be slow can usually be run asynchronously, thus removing much of the cost of the slowness. Putting this principle into practice causes one to quickly run up against the somewhat interesting approach to locking that Xapian uses for the indexing database.

When Xapian tries to open the database for write access and finds that it is already being written to, its response is to return an error. As I run "notmuch new" periodically in the background to incorporate new mail, attempts to, for example, clear the " inbox " flag sometimes fail because the database cannot be updated, and I have to wait a moment and try again. I'd much rather Notmuch did the waiting for me transparently.

If one process has the database open for read access and another process wants write access, the writer gets the access it wants and the reader will get an error the next time that it tries to retrieve data. This may be an appropriate approach for the original use case for Xapian but seems poorly suited for email access. It was sufficient to drive me to extend my wrapper script to take a lock on a file before calling the real Notmuch program, so that it would never be confronted with unsupported concurrency.

The most recent version of Xapian, the 1.4 series released in recent months, adds support for blocking locks, and Notmuch 0.23 makes use of these to provide a more acceptable experience when running Notmuch asynchronously.

Working with threads

One feature of Notmuch that I cannot quite make my mind up about is the behavior of threads. In a clear contrast to my finding with JMAP, the problem is not that the threads are too simplistic, but that they are rich and I'm not sure how best to tame them.

As I never delete email, every message in a thread remains in the mail store indefinitely. When Notmuch performs a search against the mail store it will normally list all the threads in which any message matches the search criteria. The information about the thread includes the parent/child relationship between messages, flags indicating which messages matched the search query, and what tags each individual message has.

The Emacs interface uses the parent/child information to display a tree structure using indenting. It uses the " matched " flag to de-emphasize the non-matching messages, either greying them out in the message summary list or collapsing them to a single line in the default thread display, which concatenates all messages in a thread into a single text buffer. It uses some of tags to adjust the color or font such as to highlight unread messages.

This all makes perfect sense and I cannot logically fault it, yet working with threads sometimes feels a little clumsy and I cannot say why. The most probable answer is that I haven't made the effort to learn all the navigation commands that are available; a rich structure will naturally require more subtle navigation and I'm too lazy to learn more than the basics until they prove insufficient. Maybe a focus on some self-education will go a long way here. Certainly I like the power provided by Notmuch threads, I just don't feel that I personally have tamed that power yet.

Notmuch of a wish list

Though I am sufficiently happy with Notmuch to continue using it, I always seem to want more. The need for sensible locking and for native saved searches should be addressed once I upgrade to the latest release, so I expect to be able to cross them off my wish list soon.

Asynchronous updates of the match-counts for saved searches and for the messages in a summary is the wish that is at the top of my list, but my familiarity with Emacs Lisp is not sufficient to even begin to address that, so I expect to have to live without it for a while yet.

One feature that is close to the sweet spot for being both desirable and achievable is to support an outgoing mail queue. Usually when I send email it is delivered quite promptly, thought not instantly, to the server for my email provider. Sometimes it takes longer, possibly due to a network outage, or possibly due to a configuration problem. I would like outgoing email to be immediately stored in the Notmuch database with a tag to say that it is queued. Then some Notmuch hook could periodically try to send any queued messages, and update the tag once the transmission was successful. This would mean that I never have to wait while mail is sent, but can easily see if there is anything in the outgoing queue, and can investigate at my leisure.

There are plenty of other little changes I would like to see in the user interface, but none really interesting enough to discuss here. The important aspect of Notmuch is that the underlying indexing model is sound and efficient and suits my needs. It is a good basis on which to experiment with different possibilities in the user interface.