A few weeks ago we launched our new webmail service for all users at FastMail. Once being used by a wider audience, we of course received reports of a few edge cases our testing hadn’t managed to uncover. One of the more interesting issues we discovered came from this use case: our user liked to scroll down his inbox, opening each email he wanted to read in a new tab in the background. Then he would go through the tabs, closing each one as he was done with it. So far, so good. Except, in Chrome, his browser of choice, as soon as about 5 tabs were open, the rest failed to load, and the earlier ones then started having communication errors as well.

A quick bit of research and testing yielded the problem: Chrome limits itself to a maximum of 6 concurrent connections to a single origin across the whole browser. Each tab was loading a full instance of the mail application, which meant it was creating an EventSource object and connection to our push server, to be notified of new deliveries (see this previous post for how that works). Since these connections are permanent (that’s the whole idea!), opening lots of tabs quickly used up all the available connections, with none left to fetch any actual data. To the user, this appeared as "Could not connect to server" error messages.

The solution to this problem was not immediately obvious. Ideally, we would like to maintain a single push connection and share it between the tabs, but there’s no API for getting a reference to other tabs or windows in the browser, even if they’re pointed to the same domain. Then I remembered that setting a property on local storage triggers a "storage" event on the window object of every open tab with the same origin. This, I realised, could be used to synchronise behaviour across tabs.

The concept is fairly simple. Only one tab keeps a push connection; we call this the master tab. When it receives a push event, it broadcasts it by setting the event as a property on local storage called "broadcast". When a tab receives the storage event for this key, it reads the JSON-encoded event object from local storage and processes it as though it had been received via an EventSource object.

The tricky part comes in coordinating between the tabs who should be master. The master tab also sets a value called "ping" on local storage roughly every 30 seconds to the current time stamp. When a tab first loads it checks for this value; if it is greater than 45 seconds ago it presumes there is no current master, so it becomes master. Otherwise, it becomes a slave. However, whilst it is a slave, it continuously monitors for storage events with a key of "ping", and if it hasn’t heard a ping within a 45 second period, it takes over as master. This switches control to another tab when the master tab closes. On browsers supporting the "unload" event we can make the changeover happen pretty much instantly, by setting the "ping" value to 0 in local storage when the tab is closed.

This all works very well, but there’s one problem remaining: race conditions. There is no API for taking out an explicit lock on local storage, so the spec advocates the use of a per-origin mutex which would be acquired by scripts once they try to access the storage, and then released when the script finishes. Not all browsers have adopted this. The Chrome developers, for example, have decided the performance penalty is too great. Therefore, in some browsers, it is possible for scripts in different tabs to interleave such that, for example, each tries to take master at the same time, then each notices another has taken it so none end up as master! The solution we have adopted is to add a random component to the delay between pings and waiting for pings. This makes it unlikely that two tabs will both attempt to take master at the same time. Of course this can still happen, but should it do so, the random variation in each new master sending out a ping should ensure that one is quickly turned back to a slave. It will be eventually consistent, which is good enough for our purposes.

In case this is of use to anyone else, here’s the code we use (rewritten slightly to use pure JS rather than be based on our library code). It’s also available as a gist on github. You can try it out on this test page; just open the page in several windows or tabs, then close the master and see the control pass to another. You can also broadcast a message from any tab to the other tabs.