Chrome Networking: DNS Prefetch & TCP Preconnect

When you think about browser performance the JavaScript VM wars is the first thing that comes to mind. Arguably rightfully so, since we are building far richer and more ambitious client-side apps in the browser. In fact, according to HTTP Archive the average page has nearly doubled the amount of JavaScript code just in the past year (up to 194kB).

However, during that same time the size of an average page has grown to 1059kB (over 1MB!) and is now composed of over 80 subresource requests - let that sink in for a minute. Fetching all of this content is anything but free, and as it turns out, the networking stack of a browser like Chrome is by itself an increasingly important component worth understanding when it comes to optimizing web performance.

Chrome & Multi-process resource loading

Each browser tab in Chrome is its own isolated process, which gives us great isolation and many security benefits. However, all network communication is handled by the main browser process. Whenever a tab needs to fetch a remote resource, it sends an IPC request to the host process and waits for a response.

This may seem counter-productive at first, but there are many good reasons for this architecture: the browser is able to control the network activity of each tab (security), it can limit the number of connections per host as well as provide connection pooling and re-use, and this also allows us to maintain consistent HTTP cache and session states (cookies and other cached data).

Managing all of these interactions is a non-trivial task to begin with, but what is even more interesting is the level of optimization that goes into this layer to hide the networking latency: given a remote URL, we need to resolve it (DNS), perform the TCP handshake, and only then can we send the request. An average DNS lookup takes 60~120ms, followed by a full round-trip (RTT) to perform the TCP handshake - combined, that creates 100-200ms of latency before we can even send the request! So, what could the browser do to help us offset this cost?

DNS Prefetching & TCP Preconnect

At the core of the Chrome networking stack is a single Predictor object (predictor.h), whose sole responsibility is to anticipate the user behavior, as well as the resource requests that each tab may need in the near future. Put differently, Chrome learns the network topology as you use it. If it does its job right, then it can speculatively pre-resolve the hostnames (DNS prefetching), as well as open the connections (TCP preconnect) ahead of time.

To do so, the Predictor needs to optimize against a large number of constraints: speculative prefetching and preconnect should not impact the current loading performance, being too aggressive may fetch unnecessary resources, and we must also guard against overloading the actual network. To manage this process, the predictor relies on historical browsing data, heuristics, and many other hints from the browser to anticipate the requests.

Building the Chrome Predictor

The “browser startup experience” has its own separate cache where Chrome learns the first ten visited URLs across all of your sessions. Whenever you do a fresh boot of your browser, it immediately resolves all of those hostnames - a nice optimization to speed up your morning routine! Next, you focus on the omnibar and begin typing. If the input looks like a search query, then we can preconnect to the default search engine in anticipation of the query. Alternatively, if the input or the suggestion is a high likelihood URL, then we can preconnect directly to the host. If we guess right, the DNS and TCP handshake may complete before we even hit enter!

Next, we request the URL and the parser begins tokenizing the incoming bytes to incrementally build up the DOM tree. Due to the deterministic concurrency model in the browser many subresources are blocking - the parser must stop and wait for the resource. To help eliminate the network wait time imposed by this model, WebKit uses a speculative PreloadScanner (HTMLPreloadScanner.cpp) which “looks ahead” in the document and queues up remote resources. This allows us to resolve and fetch some of the resources before the parser even sees them.

But even that is suboptimal. Ideally, we should be able to learn and anticipate the subresource connections! In fact, that is exactly what Chrome does: it learns the resource domains for each visited hostname, and on repeat visits it can preemptively resolve and preconnect to these resource hosts before the parser even sees the first byte of the document. The image below shows the inferred resource domains, as well as the stats for igvita.com.

chrome://dns - Chrome learns subresource domains

Finally, as you explore the rendered page, actions such as hovering over a link can also kick off a prefetch. Each of these signals goes into an internal FIFO prefetch queue and gets tagged with a ResolutionMotivation (url_info.h), which allows Chrome to re-order and optimize the resource load order:

enum ResolutionMotivation { MOUSE_OVER_MOTIVATED , // Mouse-over link induced resolution. PAGE_SCAN_MOTIVATED , // Scan of rendered page induced resolution. LINKED_MAX_MOTIVATED , // enum demarkation above motivation from links. OMNIBOX_MOTIVATED , // Omni-box suggested resolving this. STARTUP_LIST_MOTIVATED , // Startup list caused this resolution. NO_PREFETCH_MOTIVATION , // Browser navigation info (not prefetch related). EARLY_LOAD_MOTIVATED , // In some cases we use the prefetcher to warm up the connection // in advance of issuing the real request. UNIT_TEST_MOTIVATED , // The following involve predictive prefetching, triggered by a navigation. // The referrinrg_url_ is also set when these are used. STATIC_REFERAL_MOTIVATED , // External database suggested this resolution. LEARNED_REFERAL_MOTIVATED , // Prior navigation taught us this resolution. SELF_REFERAL_MOTIVATED , // Guess about need for a second connection. MAX_MOTIVATED // Beyond all enums, for use in histogram bounding. };

Best of all, we can inspect all of these historical and runtime caches right in the browser (copy the chrome:// links below, and open in a new tab):

chrome://predictors - omnibox predictor stats (tip: check ‘Filter zero confidences’)

- omnibox predictor stats (tip: check ‘Filter zero confidences’) chrome://net-internals/#sockets - current socket pool status

- current socket pool status chrome://net-internals#dns - Chrome’s in-memory DNS cache

- Chrome’s in-memory DNS cache chrome://histograms/DNS - histograms of your DNS performance

- histograms of your DNS performance chrome://dns - startup prefetch list and subresource host cache

DNS resolution in Chrome

DNS resolution in Chrome deserves its own in-depth treatment, but it is worth mentioning that after much deliberation the Chrome team is now experimenting with building its own DNS resolver. Currently, Chrome relies on the OS to perform the DNS resolution, and to do so it maintains a pool of 8 threads dedicated to this task. Each getaddrinfo() call is blocking, which also puts a hard cap on the concurrency. Why 8? This is an empirical number based on least common denominator of hardware - higher numbers can overload some home routers.

With the new async resolver in place, the limit could be dropped in favor of a dynamic one, and Chrome would also be able to manage its own DNS cache and perform more optimizations such as preemptive refresh of popular or expiring hostnames. For more details, read Will Chan’s post on Google+: Host resolution in Chromium.

If you are curious, you can enable the new resolver in chrome://flags (under “Built-in Asynchronous DNS”) and you can also explore the performance of your current DNS stack via the built-in Chrome histograms. In the session below, an average DNS lookup was taking 84.9ms (ouch):

chrome://histograms/DNS.PrefetchResolution

Network latency, mobile and Chrome

If the Chrome predictor does its job right then some of the cost of the networking latency can be hidden from the user. The above heuristics and algorithms have proven to yield great results, but there is still a lot of work to be done. As most of us know first hand, mobile experience today is often excruciatingly slow and this is in large part due to the much higher RTT’s (200-1000ms) on wireless networks.

In fact, it is likely that the single best optimization you can make for mobile today is to reduce the number of outbound connections and the total bytesize of your pages. Network latency is anything but free. The browser can definitely help, as we saw above, but do check your network waterfall chart - your users will thank you for it.