Service workers are a pretty cool feature of the web, enabling running code in the context of a worker, separately from a web-page, in response to various events(mostly network related events although anything could potentially integrate with one).

The emphasis is on separately from a web-page, unlike a Dedicated worker, a Service worker is also supposed to be able to run even when no page is running at all(for things like handling push notifications).

So, when implementing Service workers in Rust, it can be useful to do so in a way that highlights this “separateness” of the workers from running web pages, and this raises some interesting questions in the light of the multi-process nature of modern Web engines.

Web content these days usually runs in a process that is separate from the “main browser process”.

This separation is mostly aimed at two things:

Robustness: when your web page crashes, the whole browser doesn’t. Spectre mitigation.

So, the spec describes Service workers as “generic, event-driven, time-limited script contexts that run at an origin”(emphasis mine).

Therefore, to answer the question “where do we want to run a Service worker?”, we need to take into account that:

It might need to run separately from a web page. It should be treated as web content, and therefore isolated from the “main browser process”, as well as isolated from other origin web content.

I’ve recently worked on a change to Servo related to Service Workers, which went straight to the heart of that question, and related to finding a different answer to it than the one in place at the time, so it’s a nice way to discuss Service workers and how they fit into the general architecture of web engines.

Let’s start with an introduction first…

Agents tend to cluster, and browsing happens in groups.

To understand the structural hierarchy of content running inside a web engine, let’s start with a bottom-up description of the various entities involved.

The thing that actually runs the web content is an event-loop. Those come in “window”, “worker”, and “worklet” variants.

The spec describes their use as “To coordinate events, user interaction, scripts, rendering, networking, and so forth, user agents must use event loops”.

And by the way, I have written elsewhere about how the concept of an “event-loop” can useful in modeling concurrency in Rust, even when you’re not implementing the Web.

An event-loop is associated with an agent. Those are actually concepts from the Javascript spec, used by the HTML one, and I guess we can say that event-loops describe the “how”, while agents are the “what” of running a web-page.

Again, those come in “(similar-origin)window”, “worker”, and “worklet” variants, although this time the worker one is further divided into “Dedicated worker agent”, “Shared worker agent”, and “Service worker agent”.

So, as is probably obvious by now, a web-page will run on the event-loop of a given “similar-origin-window-agent”, while a service worker will run on the event-loop of a “Service worker agent”.

Now earlier I wrote that dedicated workers somehow are linked to a running page, while a service worker is separate from it. So if you look at agents only, then you’ll just see that a page, a service worker, and a dedicated worker are all three simply running on their own event-loop/agent combo. So they appear all equally separate from each other, right?

Then this is the right time to move up a notch in the hierarchy, and introduce agent-clusters.

So while a web-page, a dedicated worker, and a service worker, indeed all have their own agent, it so happens that a web-page and the dedicated worker(s) it started will be part of the same agent-cluster, while the service worker will be have it’s own.

So there you go: service workers have their own dedicated agent-cluster, while dedicated workers share the cluster of the window agent that created them.

How is this relevant? It’s mainly about shared-memory features of the web, like Atomics and SharedArrayBuffer. The way it works is essentially that, in order to share memory between agents, they need to be part of the same agent-cluster.

What about different web-pages(for example coming in the form of different tabs being open in a desktop browser)? When they share an origin will then they always shared an “similar-origin-window-agent”, and therefore belong to the same cluster? The answer is “no”, and finding it requires moving up another notch in the hierarchy, introducing browsing-context-groups.

Browsing context groups own a agent cluster map, and the agent clusters they contain are referred to as “browsing context agent clusters”, each containing at most one “similar-origin-window-agent” and any number of “dedicated worker agents” and “worklet agents” that are related to it.

So, when opening a new browsing context(essentially a web-page), for it to run on the same “similar-origin-window-agent” as an existing one, it needs to have the same(or similar enough) origin, and it needs to be part of the same group.

Finally, there is only one way for a new browsing context to share a group with another: it needs to be an auxiliary browsing context, created via an API like window.open . Such an auxiliary browsing context, when allocated, will be part of the same group, however if it’s origin is not “similar enough” to allow it to run on the “similar-origin-window-agent” of an existing agent-cluster in the “agent-cluster map” of the group, then a new agent-cluster, with a new “similar-origin-window-agent”, will be allocated for it.

Simple, right? Well, actually it kinda is.

So let’s go back to, why is this agent-cluster stuff necessary again?

Well this “can share memory with” is a concept at the HTML spec level, needed to ensure web developers see compatible behavior. An “agent-cluster” is an idealized “process boundary”(one that also precludes IPC-type shared-memory), however implementations of the HTML spec can still run multipe agent-clusters as part of the same process.

So now we can start thinking about the essential question of this post.

Where should we run Service workers in Servo?

What I really like about Servo is that the implementation matches the spec pretty closely, in my opinion usually more so than in Gecko or Chromium(I haven’t looked much yet at other engines). This is probably because Servo had the luxury of starting late, when the spec already had fleshed out the concepts that I am thinking about.

Anyway, in Servo, the “top-dog” in terms of structural hierarchy is probably that thing prosaically named the “Constellation”. It is the one component aware of pretty-much all other components that are found “just below” it in the hierarchy(so it’s not “aware of everything”, it’s only aware of everything in the layer right below it).

I guess it represents the “User-Agent” as a whole, a.k.a “Ze Browser”.

When the spec writes that “A user agent holds a browsing context group set (a set of browsing context groups).”, in Servo, on the constellation, you can indeed find the below:

So you see? When I wrote “Servo matches the spec nicely”, I wasn’t kidding.

This time, instead of going bottom-up from an agent to a BC group, let’s go top-down from the constellation to an agent and it’s running event-loop.

Next, lets take a look at what a BrowsingContextGroup is.

Once again, the spec wrote it, we implemented it: “A browsing context group holds a browsing context set (a set of top-level browsing contexts).”

I haven’t mentioned “top-level BCs” yet, the way to think about them for now is simply as a “web-page”, which can contain further nested BC’s that are not top-level, in form of the infamous iframe.

Next, we see a map of event-loops, and this is where things get a bit hairy. The spec writes: “A browsing context group has an associated agent cluster map (a weak map of agent cluster keys to agent clusters)”, but Servo has this EventLoop concept. That wrong shall be righted, and actually there is an issue for that if you’re interested.

As a side note, this is a good example of “Ze spec” not actually being a spec, but rather a “Living Standard™”. This shows because the whole agent-cluster business is actually relatively new. The spec used to have this very confusing concepts entitled “unit of related similar-origin browsing contexts”. That’s basically the excuse for Servo having an EventLoop and not an BrowsingContextAgentCluster .

So “event-loop”, we noted earlier, is the thing associated with an “agent” to run some JS code. Indeed it is, so you’re going to have to mentally picture AgentCluster where you just read EventLoop , also, the cluster nested to a BC group would be, as noted earlier, one containing a single “similar-origin-window-agent” as well as any number of “dedicated-worker-agent” and “worklet-agent”, which is why an event more appropriate name would be something like BrowsingContextAgentCluster .

Anyway, one thing is clear, no matter the name, there is no sign of a “service-worker-agent” anywhere yet.

So, let’s take a step down in the hierarchy, and next take a look at what this EventLoop exactly is.

To start a process, or to not start a process, that is the question

Well, surprise surprise, an EventLoop actually is not that much. It really is just a wrapper to a channel to something.

That’s actually a good example of how to create some boundaries in your system. Off-course the constellation doesn’t have a direct reference to something that is supposed to run separately from it. Instead, the constellation only owns a channel to that thing.

And that “thing” can not only run in it’s own thread(which can spawn further threads, we’ll get to that), it can also run in it’s own process.

Remember what I mentioned earlier about “mitigating Spectre” and “not running web-content in the main process of the Browser”?

So this is where we’re getting to it.

Servo can run in two modes:

Single process, everything runs in a single process(but multiple threads), Multi-process, those multiple running threads are grouped in different processes.

We’ve established earlier that the “agent cluster” nested to a “browsing-context-group”, in Servo, currently named an EventLoop , is an agent cluster containing at most one “similar-origin-window-agent”, and numerous potential “dedicated-worker/worklet-agents”. We’ve also established how an “agent-cluster” at the level of the spec establishes an “idealized process boundary”, and how in practice this can actually be implemented with an actual process as the boundary.

Now is the time to spill the beans on how this is done.

So let’s assume you’ve just opened a web-page in Servo, hence a brand new “browsing-context-group” is being allocated, and since it’s map of EventLoop is empty, that means we’ll allocated a new one too.

How does this look like?

Let’s also immediately look at spawn_multiprocess .

So, what this effectively does, with a bit of ceremony, is start a new Servo process, but importantly with the --content-process flag set.

That new process, when starting up and noticing the flag is set, will end-up calling the below:

As you can see, the call to start_all is the same as that which was done directly in the constellation process, effectively the “main process”, when in single-process mode. The difference is really just whether that call happens in the same, or a new, process.

What does this start_all do? Let’s keep it simple for now and just say that it starts what Servo currently calls a ScriptThread , which essentially matches the spec concepts of a “similar-origin-window-agent” and it’s associated “event-loop”.

And yes, any “dedicated-worker-agent” that is started by code running on a ScriptThread , will be started on a new thread, in the same process. So the constellation will not be aware of this or other dedicated workers, the constellation is only aware of the EventLoop , which is basically the agent-cluster containing the ScriptThread and associated constructs like dedicated workers and worklets.

Wasn’t this article supposed to be about Service Workers?

Now we’re getting to it.

Remember run_content_process above? I actually omitted something from it: starting-up of the “service worker manager”, essentially a component meant to manage service workers.

It looks something like:

Notice the previously absent script::init_service_workers(sw_senders); ?

Yep, so prior to the latest PR, whenever a new content process was started, a new “service worker manager” would be stared in that process. In single-process mode, this would be done only once as part of the initialization of the constellation.

So there were two problems with this setup:

https://github.com/servo/servo/issues/15217 The fact that the script::init_service_workers would have to be called in two very different places based on Servo’s processing mode.

In order to fix 1, I also considered keeping the “service worker manager” in the content-process, essentially as a kind of optimization allowing “Service worker agent” and the agent-cluster they belong to, to run in the same process as the “browsing-context agent-cluster” represented by the ScriptThread and companions.

However, this appeared complicated, and also not to match the structure suggested by the spec. In other words, it looked like an optimization whose benefit(shaving off a process), didn’t seem to outweigh it’s cost(complicated logic and a loss of structural coherency).

The solution we went for instead is to have the constellation own a map of essentially “origin” -> “service worker agent-cluster”, looking like:

And how does run_content_process look now? Roughly like the below:

As you can see, when we start a new process, it’s either related to a “pipeline”, (which matches the EventLoop I mentioned earlier, sorry there is no time to go into this further…), or it’s related to a “service worker(manager)”.

The result is that all service workers for a given origin will run in a given process, and also that we’ve moved the service worker concept from being nested to an EventLoop , to being it’s own top-level construct found on the constellation.

Let’s end with a note on the scalability of code bases

When I was doing some research prior to opening https://github.com/servo/servo/pull/26087, I really thought this was going to be quite a painful change. I mean, architecturally, it’s actually a pretty big change. Yet the PR was basically a few days of work, it felt like some sort of “quick PR” really.

How come? This brings me to the topic of code quality, programming in the large, and scalability of code bases as social enterprises.

Servo is a large code base. And beyond LOC, it’s also a complicated code base, with a multi-process structure where each process is also heavily multi-threaded, and which touches upon almost every programming concept, from running code in a VM, to networking, to graphics.

It’s also part of the efforts around engineering the Web, and the Chromium Ecosystem Infra team puts it nicely in their charter:

The web platform is one of the largest software engineering projects ever, consisting of countless standards and implementations, all evolving over time. There’s no single project manager, no single gatekeeper, but we are mostly held together by the standards process. https://docs.google.com/document/d/1MgcisuMnvh3z6QNIjDSvRbt4uoNtmI_cljcQkGXzNQ8/

In other words, this is all pretty big, complicated, and messy.

And yet, working on Servo is essentially a breeze.

Sure, it can be “hard” to implement certain things(actually you should keep pushing yourself to do stuff that feels hard otherwise you’re not going to just get better with time), but the “hard” stuff in Servo is mostly essential complexity, it’s not the code that is making it harder than it needs to be.

So it appears that Servo scales pretty well as a project/social organization. You can add contributors, more code, more features, and yet the project doesn’t slow down meaningfully.

I mean, look at that ↑ ↑. That’s just last week! And I don’t think its more than usual, as in “everyone’s stuck at home and there’s nothing else to do”, no, this was “just another week at the office”.

And on a monthly basis ↓↓, it becomes simply baffling. 19.477 files changed? Although a big chunk of that is automatically syncing with the shared test-suite, where a commit seems to change a few hundred files, those are still real changes affecting the project, and if anything it shows how Servo is part of a larger project that is constantly changing.

The risk every project face is essentially that after a while you can’t really grow the team, or the codebase, and there’s usually a handful of people who actually wrote the code and hopefully still have an idea of how it works, but adding new contributors is hard, and adding features harder still. Essentially growth slows down to a halt, the code rots away because even maintenance is too hard, and eventually the project fails.

So I think Servo’s holding up pretty well, and that’s really speaking to the quality of the project since it’s basically Godzilla-sized at this point(but compared with Gecko or Chromium, it’s like a baby Godzilla).

For those who made it thus far

Is there something you can contribute to on the topic of Service workers in Servo? Yes there is, here is a collection of relevant issues: