A while back I realised I had a ton of email archived on Gmail which I would be sad to lose if I lost access to my Google account or couldn't access the internet for some reason. I also wanted a backup in case I decided to migrate away from Gmail to use another service.

The approach I took was to use offlineimap to download the contents of my mail using Gmail's IMAP support. I set it up to download a few days of email at a time so I wouldn't encounter any bandwidth limiting from Google or risk getting my account temporarily suspended for aggressive use.

I chose to use 'Maildir' format for the downloaded mail so I could use notmuch locally to read and search.

The matter of dealing with Gmail folders is a bit tricky. These are exposed as IMAP folders and if you're not careful you can end up downloading emails multiple times for each folder. I didn't really want the folder structure. I just wanted all emails and I'd use the tagging mechanism of notmuch to add tags after the fact.

The secret to ignoring folders is to create a folderfilter entry in the .offlineimaprc file. This is a lambda function that given a folder name should return true if it's a folder you want to be downloaded by offlineimap. I use:

folderfilter = lambda foldername: foldername in ['[Gmail]/All Mail', '[Gmail]/Sent Mail']

This downloads "All Mail" and "Sent Mail". This way I get everything in my Gmail without the folder structure.

I chose to add a nametrans entry so that the downloaded folders in the Maildir have more relevant names. nametrans is a lambda function that, given a folder name, returns the name that should be used locally for that folder. Here I translate "All Mail" to "all" and "Sent Mail" to "sent":

nametrans = lambda foldername: re.sub('^[Gmail]/All Mail$', 'all', re.sub('^[Gmail]/Sent Mail$', 'sent',foldername))

To connect to Gmail the following entries are used in the remote repository section:

type = Gmail remotehost = imap.gmail.com realdelete=no maxconnections=1 ssl = yes cert_fingerprint = 6d1b5b5ee0180ab493b71d3b94534b5ab937d042 remoteport = 993 remoteuser = ... remotepass = ...

My local repository section is:

type = Maildir localfolders = ~/.Mail

To prevent having to run offlineimap for a long time on the initial sync I did it over a series of days. I used the maxage setting in the Account section. When set mail older than this number of days is not synced. So I'd set it for 100 days, do a sync. Then I'd increase it by a 100 the next day and do another sync. Over a series of days/weeks I have all my email. Once completely synced I removed the entry from the .offlineimaprc file. I'm not sure what the best value is and maybe it doesn't matter but this worked for me.

My .offlineimaprc then looks like:

[general] accounts = gmail ui = TTY.TTYUI [Account gmail] localrepository = gmailLocal remoterepository = gmailRemote maxage = 1000 [Repository gmailLocal] type = Maildir localfolders = ~/.Mail [Repository gmailRemote] type = Gmail remotehost = imap.gmail.com realdelete=no maxconnections=1 ssl = yes cert_fingerprint = 6d1b5b5ee0180ab493b71d3b94534b5ab937d042 remoteport = 993 remoteuser = ... remotepass = ... nametrans = ...show above... folderfilter = ...show above...

I used notmuch to process and search the Maildir locally. By setting synchronize_flags=true in my .notmuch-config file I could read the offline email in notmuch, incrementally sync with offlineimap, and the 'read', 'replied', etc flags are synchronized between them.

To tag with notmuch I run a script after syncing with .offlineimap that tags based on certain criteria. Something like: