Service Workers Break the Browser’s Refresh Button by Default; Here’s Why

Service Workers are like apps. You can’t safely upgrade an app while it’s still running.

Dan Fabulich is a Principal Engineer at Redfin. (We’re hiring!)

tl;dr:

By default, users have to close all tabs to a site in order to update a Service Worker. The Refresh button is not enough.

If you make a mistake here, users will see an outdated version of your site even after refreshing. Users abandon sites that never update.

Service Workers break the Refresh button because they behave like “apps,” refusing to update while the app is still running, in order to maintain code consistency and client-side data consistency.

We can write code to notify users when a new version is available. Getting it right requires deeply understanding the Service Worker lifecycle, the Caches API, the Registration API, and the Clients API.

Discuss on Reddit

Discuss on HN

Cache Invalidation: How hard could it be?

Service Workers are the hot new thing in web APIs. They’re designed to be like the HTML5 Application Cache, but without being objectionable.

The core feature of offline-enabled Service Workers is deceptively simple: we write a “fetch event listener” JavaScript function that the browser calls whenever our website makes a network request.

The browser will pass our listener a FetchEvent, whose Request contains all of the request’s details; our code then replies with a Response. Our Service Worker code can use the Fetch API or the new Cache API (a key/value store for Requests and Responses) to manage the response. For example, we can:

Return a Response from a Cache with caches.match()

Use the Request (or a modified clone of the Request) to fetch() a Response

a Response Construct a Response with new Response()

Let’s demonstrate with an example.

Naive Service Workers: Let’s practice driving without a seat belt!

Before we start, Service Workers require HTTPS or localhost . In this example, we’ll use the Python SimpleHTTPServer so we don’t have to set up HTTPS.

Let’s start with two files. First, index.html .

<!DOCTYPE html><html><head>

<meta charset="UTF-8"><title>test</title>

</head>

<body>test 1

<script>

if ('serviceWorker' in navigator) {

navigator.serviceWorker.register('sw.js');

}

</script>

</body></html>

Second, sw.js , the script that defines our Service Worker. (It’s written using the new JavaScript async functions and arrow functions, because all of the browsers that support Service Workers also support those JS features.)

addEventListener('fetch', fetchEvent => {

console.log('fetching', fetchEvent.request);

fetchEvent.respondWith(caches.match(fetchEvent.request)

.then(async cachedResponse => {

if (cachedResponse) return cachedResponse;

const response = await fetch(fetchEvent.request);

caches.open('whatever').then(cache => cache.put(

fetchEvent.request, response

));

return response.clone();

})

);

});

Inside the Service Worker, we’ve added a fetch event listener to the global scope; the browser will call us back whenever our website makes any network request … not just “fetches” with the Fetch API, but also JavaScript, CSS, images, and even the initial HTML document itself.

The FetchEvent has a Request .request property and a .respondWith method, which accepts a Promise for a Response. In this example, we’re returning a cached Response if possible. If we don’t have a cached response, we fetch() with the request, return a clone of the response, and put the original response in the Cache for next time. (We use a clone because Response bodies can only be used once. Don’t ask.)

We can test it by launching Python’s SimpleHTTPServer in the directory containing our files:

python -m SimpleHTTPServer 8000

Then we can navigate to http://localhost:8000/index.html in a browser tab to see our page. Refresh it a couple of times and we’ll see the “fetching” message appear in our Console. We can press Ctrl-C in our terminal window to shut down the web server, simulating going offline. Refresh the page in the browser; it still loads, even though the site is down.

But wait, how do I turn this thing off?!

What happens when we restart the web server and change the test 1 string in the body of the HTML to test 2 ? Nothing, of course; the browser has cached the old version of the page, and it will never update!

Uh oh! What if we remove the call to navigator.serviceWorker.register instead?

Oh, silly me, that won’t work. We can’t update index.html at all, so we certainly can’t remove the call to register !

Deleting the Service Worker script on the server won’t help, either. The server will return a 404 error for sw.js each time we refresh, but the Service Worker thinks that means it should keep running.

Holding shift and clicking the refresh button will work, showing test 2 , but if we then refresh normally, we’ll revert back to the old test 1 version.

We’re getting warmer when we replace sw.js with a blank, empty file, but you’ll find that refreshing the tab still shows the old test 1 version! That’s because the old version of sw.js is still running (“activated”). The only user-visible way to shut it down (without using Dev Tools) is to close the tab and open a fresh new tab.

This surprising behavior is 100% by design. In the next section, we’ll explain why this is the design and how to think about it.

Actually, wait a minute. I want to complain a little more first.

Service Workers are rocket science 🚀 💥

They say that there are only two hard problems in computer science: cache invalidation, naming things, and off-by-one errors. Caching is what Service Workers do. It’s literally the #1 hard thing! … or maybe the #0 thing? Whatever. It’s hard.

As programmers, we sometimes have to work on tricky stuff, but then we say, “Well, it’s not rocket science.” But Service Workers are rocket science.

What is rocket science? “Rocket science” is the feeling you get in math class where the instructor explains a proof to you in the clearest possible terms and you just don’t get it. You have to listen to the explanation multiple times, preferably in a few different ways, and then you have to sleep on it, and then you get it, maybe.

But “rocket science” isn’t just hard to understand. It’s a hard problem where the consequences for failure are catastrophic. When you fail at rocket science, a multi-million dollar rocket explodes.

If a Service Worker fails, it’s possible to break an entire website in ways that we can’t fix on the server side (or at least, not right away). Imagine this: your site appears to be down, but refreshing the page won’t help, because the browser isn’t even talking to your server; it’s just talking to your broken Service Worker.

And if we fail to upgrade our Service Worker correctly, especially if we fail to invalidate the Service Worker’s cache, we may not even be able to fix our Service Worker. Service Workers can refuse to update, because the browser has cached the Service Worker script itself.

When faced with rocket-science technology, there’s only one thing to do: roll up our sleeves and try to understand the whole thing.

Atomic updates and the pursuit of app-iness (or, Why Service Workers are so hard to update)

Service Workers are “sticky.” Previously, we saw that a naive implementation of a Service Worker can be very hard to deactivate. We tried refreshing the page, shift-refreshing the page, deleting the Service Worker script, and replacing the script with a blank script. Only the last approach worked, but even then, it only took effect after completely closing out all tabs.

Service Workers are sticky because they’re trying to play the role of an “app.”

Why do we have to restart apps when we upgrade them? “Code consistency” and “data consistency”

Think about how native apps work in your favorite desktop operating system. Er, no, not that one. Think about your favorite popular desktop operating system.

Specifically, think about the process of installing and upgrading a native app. For example, let’s install a cross-stitch app like WinStitch or MacStitch.

After installing WinStitch, we’ll have an icon for it in our app launcher. An app like WinStitch contains multiple files in a “bundle,” but the operating system UI gives it one icon, which makes it seem as if the app is just one thing, one file.

To upgrade WinStitch, the installer deletes the old bundle completely and replaces it with the new version’s bundle as an indivisible “atomic” upgrade step. WinStitch obviously doesn’t work if we delete its files while it’s running, so we have to shut down the app first.

Why can’t we just update the files one by one while the app is running? For one thing, nobody knows which of those files are tightly coupled to each other, including the developer. Replacing just some of those files and not others would introduce weird bugs that are hard to reproduce. (We could try to support file-by-file upgrades in apps we develop, but we’d have to write the app very defensively and test the app very carefully.) So it’s a lot safer and easier to just delete the whole bundle. Deleting the bundle and installing the whole thing at once ensures “code consistency,” as I call it.

Of course, we have to be careful when we delete the old bundle, because WinStitch also includes important user-data files and documents (like an enormous cross-stitch pattern of Queen Elsa from Frozen) that we don’t want to delete.

This was a lot of work!

From time to time, developers don’t just need to upgrade their code; they need to change the format in which their data is stored. For example, if the app uses a local SQLite database, a new version might change the schema of the data. But even if it’s a flat file, the new version might change the format of the file from XML to JSON, or reorganize the way that data is stored within the file. App developers have to detect data files in the old schema/format and migrate them to the new schema.

But this migration would be a lot harder or even impossible if it happened while the old version of the app was running. The old version of the app may not be able to read and write data in the new format, causing really bad bugs, possibly including data loss. Running only one version of the app at a time ensures what I call “data consistency.”

Web pages in tabs don’t naturally support either code consistency or data consistency

Apps require code consistency and data consistency; in both kinds of consistency, we have to ensure that there’s exactly one version of the app running at a time. Browser tabs don’t naturally support features like this.

If you open a tab to your favorite online web app, leave it open for a few days, and then open another tab to the same web app, it’s possible for each tab to run different versions of the code. If the web app stores data on the client side (in IndexedDB, WebSQL, or localStorage), the second tab could attempt to migrate your data to a new version, confusing the app running in the old tab. The old tab might lose client-side data, or even corrupt the client-side database.

The browser doesn’t even guarantee that all of the files needed for a given web page (JS, CSS, images, etc.) are from the same version. If our HTML document refers to simple URLs like /myscript.js and /mystyles.css , we could deploy a new version of our web app on the server side during page load; it’s possible for the browser to download the old version of the JS and the new version of the CSS. Developers can work around this browser limitation by using versioned URLs, like this: /v1/myscript.js and /v1/mystyles.css . (Alternately, it can be better to compute a “hash” digest of the file, like /myscript.js?hash=cb2c6d594dca37d8afcfaf16385ea1e7 .)

But now let’s suppose the user grabs the newest, latest version of the HTML, which requires updated scripts and styles. Then, just as we finish downloading the HTML, the user goes offline, unable to reach the new scripts and styles. If the browser starts using the new HTML at this point, it won’t work. We’ve discarded the old web app bundle, but we only have a partial web app bundle to replace it with. This breaks the goal of “code consistency” we were striving for.

Service Workers to the rescue: Preserve code consistency with the installation lifecycle

When we call navigator.serviceWorker.register() , we have to wait for the Service Worker script to download, parse, and compile, which is called the “installation” phase of its life cycle.

The Service Worker API allows us to do additional work during installation, or even abort installation in case of error, by adding an install event listener.

addEventListener('install', installEvent => {

installEvent.waitUntil(

caches.open('v1').then(cache => cache.addAll[

'/',

'/v1/style.css',

'/v1/app.js',

])

);

});

During installation, the browser passes us an InstallEvent. InstallEvents don’t have a respondWith method, but they are ExtendableEvents, which means that they have a waitUntil method, which accepts a Promise. If the waitUntil Promise fails during the install event, the browser will drop that failed Service Worker like a hot rock slathered in cheap margarine.

In this example, we use Cache.addAll() to fetch all all of the URLs we need for the web app to function and put them in a Cache; if any of the fetches fail, Cache.addAll() will atomically fail installation, caching nothing.

That’s exactly how I live my life: complete success or total failure, with nothing between.

Service Workers to the rescue: Preserve data consistency with the activation lifecycle

Once our Service Worker finishes installing, you might reasonably think that it would “activate” and start intercepting network requests, but in order to ensure that only one version of the app runs at a time, updated Service Workers don’t activate right away.

Instead, if there’s another “active” Service Worker on our site—if there’s an open tab under its control—the new Service Worker goes into a “waiting” state. (The Service Worker API calls this state “installed,” not yet “activated.”)

How long will it wait? Forever, if need be. The new Service Worker waits until the old Service Worker dies, which only happens when all of the tabs it controls are closed. (The Service Worker API calls tabs “clients.” Each browser tab and each iframe within the tab counts as a separate client; the old Service Worker stops when all of its clients have stopped.)

The browser provides an activate event, which, like the install event, is another waitUntil ExtendibleEvent that we can use to perform data migration. During activation, we usually delete Caches belonging to obsolete Service Workers.

const LATEST_CACHE_ID = 'v2'; addEventListener('install', installEvent => {

installEvent.waitUntil(

caches.open(LATEST_CACHE_ID).then(cache => cache.addAll[

'/',

'/v2/style.css',

'/v2/app.js',

]);

})

);

}); addEventListener('activate', activateEvent => {

activateEvent.waitUntil(

caches.keys().then(keyList => Promise.all(keyList.map(key => {

if (key !== LATEST_CACHE_ID) {

return caches.delete(key);

}

})))

);

});

This will delete the v1 Cache before activating the Service Worker, leaving only the v2 Cache.

v1 tabs tightly couple to their v1 Cache; v2 tabs tightly couple to their v2 Cache. This tight coupling makes them “application caches.” The app must be completely shut down (all tabs closed) in order to upgrade atomically to the newest cache of code.

The activation lifecycle explains why the Refresh button doesn’t activate anything

You might think that clicking the Refresh button would activate a waiting Service Worker, but it doesn’t, for two reasons.

First, if we have multiple tabs (“clients”) open, we want to make sure that all tabs are on the same version. We can refresh each tab as often as we like, but the old Service Worker has to keep handling all of them, to maintain consistency.

Second, even if we refresh our web app’s only tab, it turns out that browsers begin loading the refreshed page before the old page dies. As a result, whenever you click the Refresh button, there are two clients simultaneously: the old client, doomed to die as soon as the refresh completes, and the new client which just launched. But as long as there are two clients, both must be handled by the old Service Worker.

In my opinion, this single-tab Refresh behavior strongly violates the principle of least surprise, and it doesn’t help us to maintain code consistency or data consistency.

Worst of all, the burden falls on us, the developers of an offline web app, to understand this bug and then implement a correct fix.

And by “us,” I mean “you.”

In my next article, I’ll explain exactly how to implement the fixes. See you there!

P.S. Redfin is hiring.