Over the last decade, web performance optimization has been controlled by one indisputable guideline: the best request is no request. A very humble rule, easy to interpret. Every network call for a resource eliminated improves performance. Every src attribute spared, every link element dropped. But everything has changed now that HTTP/2 is available, hasn’t it? Designed for the modern web, HTTP/2 is more efficient in responding to a larger number of requests than its predecessor. So the question is: does the old rule of reducing requests still hold up?

Article Continues Below

What has changed with HTTP/2?#section2

To understand how HTTP/2 is different, it helps to know about its predecessors. A brief history follows. HTTP builds on TCP. While TCP is powerful and is capable of transferring lots of data reliably, the way HTTP/1 utilized TCP was inefficient. Every resource requested required a new TCP connection. And every TCP connection required synchronization between the client and server, resulting in an initial delay as the browser established a connection. This was OK in times when the majority of web content consisted of unstyled documents that didn’t load additional resources, such as images or JavaScript files.

Updates in HTTP/1.1 try to overcome this limitation. Clients are able to use one TCP connection for multiple resources, but still have to download them in sequence. This so-called “head of line blocking” makes waterfall charts actually look like waterfalls:

Figure 1. Schematic waterfall of assets loading over one pipelined TCP connection

Also, most browsers started to open multiple TCP connections in parallel, limited to a rather low number per domain. Even with such optimizations, HTTP/1.1 is not well-suited to the considerable number of resources of today’s websites. Hence the saying “The best request is no request.” TCP connections are costly and take time. This is why we use things like concatenation, image sprites, and inlining of resources: avoid new connections, and reuse existing ones.

HTTP/2 is fundamentally different than HTTP/1.1. HTTP/2 uses a single TCP connection and allows more resources to be downloaded in parallel than its predecessor. Think of this single TCP connection as one broad tunnel where data is sent through in frames. On the client, all packages get reassembled into their original source. Using a couple of link elements to transfer style sheets is now as practically efficient as bundling all of your style sheets into one file.

Figure 2. Schematic waterfall of assets loading over one shared TCP connection

All connections use the same stream, so they also share bandwidth. Depending on the number of resources, this might mean that individual resources could take longer to be transmitted to the client side on low-bandwidth connections.

This also means that resource prioritization is not done as easily as it was with HTTP/1.1: the order of resources in the document had an impact on when they begin to download. With HTTP/2, everything happens at the same time! The HTTP/2 spec contains information on stream prioritization, but at the time of this writing, placing control over prioritization in developers’ hands is still in the distant future.

The best request is no request: cherry-picking#section3

So what can we do to overcome the lack of waterfall resource prioritization? What about not wasting bandwidth? Think back to the first rule of performance optimization: the best request is no request. Let’s reinterpret the rule.

For example, consider a typical webpage (in this case, from Dynatrace). The screenshot below shows a piece of online documentation consisting of different components: main navigation, a footer, breadcrumbs, a sidebar, and the main article.

Figure 3. A typical website split into a few components

On other pages of the same site, we have things like a masthead, social media outlets, galleries, or other components. Each component is defined by its own markup and style sheet.

In HTTP/1.1 environments, we would typically combine all component style sheets into one CSS file. The best request is no request: one TCP connection to transfer all the CSS necessary, even for pages the user hasn’t seen yet. This can result in a huge CSS file.

The problem is compounded when a site uses a library like Bootstrap, which reached the 300 kB mark, adding site-specific CSS on top of it. The actual amount of CSS required by any given page, in some cases, was even less than 10% of the amount loaded:

Figure 4. Code coverage of a random cinema webpage that uses 10% of the bundled 300 kB CSS. This page is built upon Bootstrap.

There are even tools like UnCSS that aim to get rid of unused styles.

The Dynatrace documentation example shown in figure 3 is built with the company’s own style library, which is tailored to the site’s specific needs as opposed to Bootstrap, which is offered as a general purpose solution. All components in the company style library combined add up to 80 kB of CSS. The CSS actually used on the page is divided among eight of those components, totaling 8.1 kB. So even though the library is tailored to the specific needs of the website, the page still uses only around 10% of the CSS it downloads.

HTTP/2 allows us to be much more picky when it comes to the files we want to transmit. The request itself is not as costly as it is in HTTP/1.1, so we can safely use more link elements, pointing directly to the elements used on that particular page:

<link rel="stylesheet" href="/css/base.css"> <link rel="stylesheet" href="/css/typography.css"> <link rel="stylesheet" href="/css/layout.css"> <link rel="stylesheet" href="/css/navbar.css"> <link rel="stylesheet" href="/css/article.css"> <link rel="stylesheet" href="/css/footer.css"> <link rel="stylesheet" href="/css/sidebar.css"> <link rel="stylesheet" href="/css/breadcrumbs.css">

This, of course, is true for every sprite map or JavaScript bundle as well. By just transferring what you actually need, the amount of data transferred to your site can be reduced greatly! Compare the download times for bundle and single files shown with Chrome timings below:

Figure 5. Download of the bundle. After the initial connection is established, the bundle takes 583 ms to download on regular 3G.

Figure 6. Split only the files needed, and download them in parallel. The initial connection takes about as long, but the content (one style sheet, in this case) downloads much faster because it is smaller.

The first image shows that including the time required for the browser to establish the initial connection, the bundle needs about 700 ms to download on regular 3G connections. The second image shows timing values for one CSS file out of the eight that make up the page. The beginning of the response (TTFB) takes as long, but since the file is a lot smaller (less than 1 kB), the content is downloaded almost immediately.

This might not seem impressive when looking at only one resource. But as shown below, since all eight style sheets are downloaded in parallel, we still can save a great deal of transfer time when compared to the bundle approach.

Figure 7. All style sheets on the split variant load in parallel.

When running the same page through webpagetest.org on regular 3G, we can see a similar pattern. The full bundle ( main.css ) starts to download just after 1.5 s (yellow line) and takes 1.3 s to download; the time to first meaningful paint is around 3.5 seconds (green line):

Figure 8. Full page download of the bundle, regular 3G.

When we split up the CSS bundle, each style sheet starts to download at 1.5 s (yellow line) and takes 315–375 ms to finish. As a result, we can reduce the time to first meaningful paint by more than one second (green line):

Figure 9. Downloading single files instead, regular 3G.

Per our measurements, the difference between bundled and split files has more impact on slow 3G than on regular 3G. On the latter, the bundle needs a total of 4.5 s to be downloaded, resulting in a time to first meaningful paint at around 7 s:

Figure 10. Bundle, slow 3G.

The same page with split files on slow 3G connections via webpagetest.org results in meaningful paint (green line) occurring 4 s earlier:

Figure 11. Split files, slow 3G.

The interesting thing is that what was considered a performance anti-pattern in HTTP/1.1—using lots of references to resources—becomes a best practice in the HTTP/2 era. Plus, the rule stays the same! The meaning changes slightly.

The best request is no request: drop files and code your users don’t need!

It has to be noted that the success of this approach is strongly connected to the number of resources transferred. The example above used 10% of the original style sheet library, which is an enormous reduction in file size. Downloading the whole UI library in split-up files might give different results. For example, Khan Academy found that by splitting up their JavaScript bundles, the overall application size—and thus the transfer time–became drastically worse. This was mainly because of two reasons: a huge amount of JavaScript files (close to 100), and the often underestimated powers of Gzip.

Gzip (and Brotli) yields higher compression ratios when there is repetition in the data it is compressing. This means that a Gzipped bundle typically has a much smaller footprint than Gzipped single files. So if you are going to download a whole set of files anyway, the compression ratio of bundled assets might outperform that of single files downloaded in parallel. Test accordingly.

Also, be aware of your user base. While HTTP/2 has been widely adopted, some of your users might be limited to HTTP/1.1 connections. They will suffer from split resources.

The best request is no request: caching and versioning#section4

To this point with our example, we’ve seen how to optimize the first visit to a page. The bundle is split up into separate files and the client receives only what it needs to display on a page. This gives us the chance to look into something people tend to neglect when optimizing for performance: subsequent visits.

On subsequent visits we want to avoid re-transferring assets unnecessarily. HTTP headers like Cache-Control (and their implementation in servers like Apache and NGINX) allow us to store files on the user’s disk for a specified amount of time. Some CDN servers default that to a few minutes. Some others to a few hours or days even. The idea is that during a session, users shouldn’t have to download what they already have in the past (unless they’ve cleared their cache in the interim). For example, the following Cache-Control header directive makes sure the file is stored in any cache available, for 600 seconds.

Cache-Control: public, max-age=600

We can leverage Cache-Control to be much more strict. In our first optimization we decided to cherry-pick resources and be choosy about what we transfer to the client, so let’s store these resources on the machine for a long period of time:

Cache-Control: public, max-age=31536000

The number above is one year in seconds. The usefulness in setting a high Cache-Control max-age value is that the asset will be stored by the client for a long period of time. The screenshot below shows a waterfall chart of the first visit. Every asset of the HTML file is requested:

Figure 12. First visit: every asset is requested.

With properly set Cache-Control headers, a subsequent visit will result in less requests. The screenshot below shows that all assets requested on our test domain don’t trigger a request. Assets from another domain with improperly set Cache-Control headers still trigger a request, as do resources which haven’t been found:

Figure 13. Second visit: only some poorly cached SVGs from a different server are requested again.

When it comes to invalidating the cached asset (which, consequently, is one of the two hardest things in computer science), we simply use a new asset instead. Let’s see how that would work with our example. Caching works based on file names. A new file name triggers a new download. Previously, we split up our code base into reasonable chunks. A version indicator makes sure that each file name stays unique:

<link rel="stylesheet" href="/css/header.v1.css"> <link rel="stylesheet" href="/css/article.v1.css">

After a change to our article styles, we would modify the version number:

<link rel="stylesheet" href="/css/header.v1.css"> <link rel="stylesheet" href="/css/article.v2.css">

An alternative to keeping track of the file’s version is to set a revision hash based on the file’s content with automation tools.

It’s OK to store your assets on the client for a long period of time. However, your HTML should be more transient in most cases. Typically, the HTML file contains the information about which resources to download. Should you want your resources to change (such as loading article.v2.css instead of article.v1.css, as we just saw), you’ll need to update references to them in your HTML. Popular CDN servers cache HTML for no longer than six minutes, but you can decide what’s better suited for your application.

And again, the best request is no request: store files on the client as long as possible, and don’t request them over the wire ever again. Recent Firefox and Edge editions even sport an immutable directive for Cache-Control, targeting this pattern specifically.

Bottom line#section5

HTTP/2 has been designed from the ground up to address the inefficiencies of HTTP/1. Triggering a large number of requests in an HTTP/2 environment is no longer inherently bad for performance; transferring unnecessary data is.

To reach the full potential of HTTP/2, we have to look at each case individually. An optimization that might be good for one website can have a negative effect on another. With all the benefits that come with HTTP/2 , the golden rule of performance optimization still applies: the best request is no request. Only this time we take a look at the actual amount of data transferred.

Only transfer what your users actually need. Nothing more, nothing less.