Velocity Farewell Velocity Origin Story In 2007, my first book, High Performance Web Sites, was selling very well. That, plus the launch of YSlow, brought more attention to web performance. As a result, Jon Jenkins invited me to give a tech talk at Amazon. Afterward, he, John Rauser, and I, plus a few other performance-minded folk, had a great dinner and spirited discussion about performance best practices. Jon asked me if there was any place where people got together to have that type of discussion. I couldn’t think of one so promised to talk to O’Reilly about the idea of a conference. I emailed my editor, Andy Oram, and asked if he thought a performance conference was a good idea. He agreed to setup a meeting with Tim O’Reilly at OSCON. That meeting included people that I would end up working closely with for the next decade: Tim, Andy, Gina Blaber, Brady Forrest, Jon Jenkins, Artur Bergman, and Jesse Robbins. (There were others there as well and I apologize for not remembering.) I’m not sure, but I think Jon invited Jesse to the meeting. Both of them worked at Amazon where Jesse was the “master of disaster” – helping to create a resilient infrastructure for their services. We discussed the idea of a conference combining performance and operations. It made sense. Both groups (or “tribes” as Jesse called them) focused on optimization. The tools and stacks might have been different, but the mindset was the same. The first Velocity was held June 2008. Jesse and I were the co-chairs. I remember being surprised and grateful when Gina called me to ask if I would co-chair. It seems like that was a long time ago. Take a look at the speaker list for that first Velocity – it contains many people who were or would become luminaries in emerging fields of DevOps and WPO – Web Performance Optimization (aka, FEO – Frontend End Optimization): John Allspaw – future Velocity co-chair, current CTO at Etsy

Artur Bergman – founder of Fastly and who would become my future boss

Eric Goldsmith – worked with Pat Meenan at AOL on WebPageTest

Jason Grigsby – mobile guru with me this week at Velocity again!

Adam Jacob – CTO at Chef

Luke Kanies – founder of Puppet Labs

Eric Lawrence – OMG! Internet Explorer, Fiddler, now Google Chrome

John Rauser – OMG! Amazon, Pinterest, Snapchat, also with me this week at Velocity again!

Eric Schurman – Microsoft, now Amazon, and still on the Velocity Program Committee

Stoyan Stefanov – member of the original Yahoo! Exceptional Performance team, creator of the Performance Calendar, now at Facebook

Mandi Walls – keynoted at the very first Velocity, and sitting with me now in the Velocity speaker room again! Transition Yesterday, I announced this is my last Velocity as co-chair. I’ve been doing Velocity for a long time: nine years comprised of 21 shows (9 in California, 6 in Europe, 4 in New York, and 2 in Beijing). It feels good to open up the opportunity for someone else. Courtney Nash, who owns Velocity at O’Reilly, announced that performance will be merged with Fluent in 2017 to be chaired by Kyle Simpson and Tammy Everts. Also, Inés Sombra is the new co-chair for Velocity. The future looks good. So many thank yous: O’Reilly Media for starting Velocity (and pushing me to accept that name over my choice, “Web Performance Conference”). Jesse Robbins, John Allspaw, James Turnbull and Courtney Nash for accepting me (putting up with me) as a Velocity co-chair. All the Program Committee members. The sponsors and vendors who supported Velocity. The amazing speakers over these past nine years. And all the attendees. We should pat each other on the back – it’s important to do that every once-in-a-while. We have created a strong, smart, caring, supportive community. Out of that have come numerous industries, technologies, and companies. It has been an amazing experience. Thank you!

Critical Metric: Critical Resources A big change in the World of Performance for 2015 [this post is being cross-posted from the 2015 Performance Calendar] is the shift to metrics that do a better job of measuring the user experience. The performance industry grew up focusing on page load time, but teams with more advanced websites have started replacing PLT with metrics that have more to do with rendering and interactivity. The best examples of these new UX-focused metrics are Start Render and Speed Index. Start Render and Speed Index A fast start render time is important for a good user experience because once users request a new page, they’re left staring at the old page or, even worse, a blank screen. This is frustrating for users because nothing is happening and they don’t know if the site is down, if they should reload the page, or if they should simply wait longer. A fast start render time means the user doesn’t have to experience this frustration because she is reassured that the site is working and delivering upon her request. Speed Index, a metric developed by Pat Meenan as part of WebPageTest, is the average time at which visible parts of the page are displayed. Whereas start render time captures when the rendering experience starts, Speed Index reflects how quickly the entire viewport renders. These metrics measure different things, but both focus on how quickly pages render which is critical for a good user experience. Critical Resources The main blockers to fast rendering are stylesheets and synchronous scripts. Stylesheets block all rendering in the page until they finish loading. Synchronous scripts (e.g., <script src="main.js"> ) block rendering for all following DOM elements. Therefore, synchronous scripts in the HEAD of the page block the entire page from rendering until they finish loading. I call stylesheets and synchronous scripts “critical blocking resources” because of their big impact on rendering. A few months back I decided to start tracking this as a new performance metric as part of SpeedCurve and the HTTP Archive. Most performance services already have metrics for scripts and stylesheets, but a separate metric for critical resources is different in a few ways: It combines stylesheets and synchronous scripts into a single metric, making it easier to track their impact.

It only counts synchronous scripts. Asynchronous scripts don’t block rendering so they’re not included. The HTTP Archive data for the world’s top 500K URLs shows that the median website has 10 synchronous scripts and 2 async scripts, so ignoring those async scripts gives a more accurate measurement of the impact on rendering. (I do this as a WebPageTest custom metric. The code is here.)

Synchronous scripts loaded in iframes are not included because they don’t block rendering of the main page. (I’m still working on code to ignore stylesheets in iframes.) Critical Metric I’m confident this new “critical resources” metric will prove to be key for tracking a good user experience in terms of performance. Whether that’s true will be borne out as adoption grows and we gain more experience correlating this to other metrics that reflect a good user experience. In the meantime, I added this metric to the HTTP Archive and measured the correlation to start render time, Speed Index, and page load time. Here are the results for the Dec 1 2015 crawl: The critical resources metric described in this article is called “CSS & Sync JS” in the charts above. It has the highest correlation to Speed Index and the second highest correlation to start render time. This shows that “critical resources” is a good indicator of rendering performance. It doesn’t show up in the top five variables correlated to load time, which is fine. Most people agree that page load time is no longer a good metric because it doesn’t reflect the user experience. We all want to create great, enjoyable user experiences. With the complexity of today’s web apps – preloading, lazy-loading, sync & async scripts, dynamic images, etc. – it’s important to have metrics that help us know when our user experience performance is slipping. Tracking critical resources provides an early indicator of how our code might affect the user experience, so we can keep our websites fast and our users happy.

domInteractive: is it? really? Netflix published a great article, Making Netflix.com Faster, about their move to Node.js. The article mentions how they use performance.timing.domInteractive to measure improvements: In order to test and understand the impact of our choices, we monitor a metric we call time to interactive (tti). Amount of time spent between first known startup of the application platform and when the UI is interactive regardless of view. Note that this does not require that the UI is done loading, but is the first point at which the customer can interact with the UI using an input device. For applications running inside a web browser, this data is easily retrievable from the Navigation Timing API (where supported). I keep my eye out for successful tech companies that use something other than window.onload to measure performance. We all know that window.onload is no longer a good way to measure website speed from the user’s perspective. While domInteractive is a much better metric, it also suffers from not being a good proxy for the user experience in some (fairly common) situations. The focus of Netflix’s redesign is to get interactive content in front of the user as quickly as possible. This is definitely the right goal. The problem is, domInteractive does not necessarily reflect when that happens. The discrepancies are due to scripts, stylesheets, and fonts that block rendering. Scripts can make domInteractive measurements too large when the content is actually visible much sooner. Stylesheets and custom fonts make domInteractive measurements too small suggesting that content was interactive sooner than it really was. To demonstrate this I’ve created three test pages. Each page has some mockup “interactive content” – links to click on and a form to submit. The point of these test pages is to determine when this critical content becomes interactive and compare that to the domInteractive time. Scripts: Loading the domInteractive JS – too conservative test page, we see that the critical content is visible almost immediately. Below this content is a script that takes 8 seconds to load. The domInteractive time is 8+ seconds, even though the critical content was interactive in about half a second. If you have blocking scripts on the page that occur below the critical content, domInteractive produces measurements that are too conservative – they’re larger than when the content is actually interactive. (WebPageTest results for Chrome, Firefox, IE & Safari.) Stylesheets: On the other hand, loading the domInteractive CSS – too optimistic test page, the critical content isn’t visible for 8+ seconds. That’s because this content is blocked from rendering until the stylesheet finishes downloading, which takes 8 seconds. However, the domInteractive time is about half a second. If you use stylesheets, domInteractive may produce measurements that are too optimistic – they’re smaller than when the content is actually interactive. (WebPageTest results for Chrome, Firefox, IE & Safari.) Fonts: The results for the domInteractive Font – too optimistic have a domInteractive time of about half a second in all browsers, but the time at which the critical content is visible varies. In Safari, the critical content isn’t visible for 8+ seconds while waiting for the font file to load. In Chrome and Firefox, the critical content is blocked for 3 seconds at which time it’s rendered in a default font. That’s because Chrome and Firefox use a three second timeout when waiting for fonts; if the font doesn’t arrive within three seconds then a default font is used. IE11 displays the critical content immediately using a default font. In Chrome, Firefox and IE11, the content is re-rendered when the font file finishes downloading after 8+ seconds. If you use custom fonts for any of your critical content, domInteractive may produce measurements that are too optimistic in Chrome, Firefox, and Safari – they’re smaller than when the content was actually interactive. (Unless you load your fonts asynchronously like Typekit does.) (WebPageTest results for Chrome, Firefox, IE & Safari.) One solution is to avoid these situations: always load scripts, stylesheets, and fonts asynchronously. While this is great, it’s not always possible (I’m looking at you third party snippets!) and it’s hard to ensure that everything is async as code changes. The best solution is to use custom metrics. Today’s web pages are so complex and unique, the days of a built-in standard metric that applies to all websites are over. Navigation Timing gives us metrics that are better than window.onload , but teams that are focused on performance need to take the next step and implement custom metrics to accurately measure what their users are experiencing.

Joining SpeedCurve I’m excited to announce that I’m joining SpeedCurve! SpeedCurve provides insight into the interaction between performance and design to help companies deliver fast and engaging user experiences. I’ll be joining Mark Zeman, who launched the company in 2013. We make a great team. I’ve been working on web performance since 2002. Mark has a strong design background including running a design agency and lecturing at New Zealand’s best design school. At SpeedCurve, Mark has pioneered the intersection of performance and design. I’m looking forward to working together to increase the visibility developers and designers have into how their pages perform. During the past decade+ of evangelizing performance, I’ve been fortunate to work at two of the world’s largest web companies as Chief Performance Yahoo! and Google’s Head Performance Engineer. This work provided me with the opportunity to conduct research, evaluate new technologies, and help the performance community grow. One aspect that I missed was having deeper engagements with individual companies. That was a big reason why I joined Fastly as Chief Performance Officer. Over the past year I’ve been able to sit down with Fastly customers and help them understand how to speed up their websites and applications. These companies were able to succeed not only because Fastly has built a powerful CDN, but also because they have an inspiring team. I will continue to be a close friend and advisor to Fastly. During these engagements, I’ve seen that many of these companies don’t have the necessary tools to help them identify how performance is impacting (hurting) the user experience on their websites. There is even less information about ways to improve performance. The standard performance metric is page load time, but there’s often no correlation between page load time and the user’s experience. We need to shift from network-based metrics to user experience metrics that focus on rendering and when content becomes available. That’s exactly what Mark is doing at SpeedCurve, and why I’m excited to join him. The shift toward focusing on both performance and design is an emerging trend highlighted by Lara Hogan’s book Designing for Performance and Mark’s Velocity speaking appearances. SpeedCurve is at the forefront of this movement. We (mostly Mark) just launched a redesign of SpeedCurve that includes many new, powerful features such as responsive design analysis, performance budgets, and APIs for continuous deployment. Check it out and let us know what you think.

SERIOUS CONFUSION with Resource Timing or “Duration includes Blocking” Resource Timing is a great way to measure how quickly resources download. Unfortunately, almost everyone I’ve spoken with does this using the “duration” attribute and are not aware that “duration” includes blocking time. As a result, “duration” time values are (much) greater than the actual download time, giving developers unexpected results. This issue is especially bad for cross-origin resources where “duration” is the only metric available. In this post I describe the problem and a proposed solution. Resource Timing review The Resource Timing specification defines APIs for gathering timing metrics for each resource in a web page. It’s currently available in Chrome, Chrome for Android, IE 10-11, and Opera. You can gather a list of PerformanceEntry objects using getEntries(), getEntriesByType(), and getEntriesByName(). A PerformanceEntry has these properties: name – the URL

– the URL entryType – typically “resource”

– typically “resource” startTime – time that the resource started getting processed (in milliseconds relative to page navigation)

– time that the resource started getting processed (in milliseconds relative to page navigation) duration – total time to process the resource (in milliseconds) The properties above are available for all resources – both same-origin and cross-origin. However, same-origin resources have additional properties available as defined by the PerformanceResourceTiming interface. They’re self-explanatory and occur pretty much in this chronological order: redirectStart

redirectEnd

fetchStart

domainLookupStart

domainLookupEnd

connectStart

connectEnd

secureConnectionStart

requestStart

responseStart

responseEnd Here’s the canonical processing model graphic that shows the different phases. Note that “duration” is equal to (responseEnd – startTime). Take a look at my post Resource Timing Practical Tips for more information on how to use Resource Timing. Unexpected blocking bloat in “duration” The detailed PerformanceResourceTiming properties are restricted to same-origin resources for privacy reasons. (Note that any resource can be made “same-origin” by using the Timing-Allow-Origin response header.) About half of the resources on today’s websites are cross-origin, so “duration” is the only way to measure their load time. And even for same-origin resources, “duration” is the only delta provided, presumably because it’s the most important phase to measure. As a result, all of the Resource Timing implementations I’ve seen use “duration” as the primary performance metric. Unfortunately, “duration” is more than download time. It also includes “blocking time” – the delay between when the browser realizes it needs to download a resource to the time that it actually starts downloading the resource. Blocking can occur in several situations. The most typical is when there are more resources than TCP connections. Most browsers only open 6 TCP connections per hostname, the exceptions being IE10 (8 connections), and IE11 (12 connections). This Resource Timing blocking test page has 16 images, so some images incur blocking time no matter which browser is used. Each of the images is programmed on the server to have a 1 second delay. The “startTime” and “duration” are displayed for each of the 16 images. Here are WebPagetest results for this test page being loaded in Chrome, IE10, and IE11. You can look at the screenshots to read the timing results. Note how “startTime” is approximately the same for all images. That’s because this is the time that the browser parsed the IMG tag and realized it needed to download the resource. But the “duration” values increase in steps of ~1 second for the images that occur later in the page. This is because they are blocked from downloading by the earlier images. In Chrome, for example, the images are downloaded in three sets – because Chrome only downloads 6 resources at a time. The first six images have a “duration” of ~1.3 seconds (the 1 second backend delay plus some time for establishing the TCP connections and downloading the response body). The next six images have a “duration” of ~2.5 seconds. The last four images have a “duration” of ~3.7 seconds. The second set is blocked for ~1 second waiting for the first set to finish. The third set is blocked for ~2 seconds waiting for sets 1 & 2 to finish. Even though the “duration” values increase from 1 to 2 to 3 seconds, the actual download time for all images is ~1 second as shown by the WebPagetest waterfall chart. The results are similar for IE10 and IE11. IE10 starts with six parallel TCP connections but then ramps up to eight connections. IE11 also starts with six parallel TCP connections but then ramps up to twelve. Both IE10 and IE11 exhibit the same problem – even though the load time for every image is ~1 second, “duration” shows values ranging from 1-3 seconds. proposal: “networkDuration” Clearly, “duration” is not an accurate way to measure resource load times because it may include blocking time. (It also includes redirect time, but that occurs much less frequently.) Unfortunately, “duration” is the only metric available for cross-origin resources. Therefore, I’ve submitted a proposal to the W3C Web Performance mailing list to add “networkDuration” to Resource Timing. This would be available for both same-origin and cross-origin resources. (I’m flexible about the name; other candidates include “networkTime”, “loadTime”, etc.) The calculation for “networkDuration” is as follows. (Assume “r” is a PerformanceResourceTiming object.) dns = r.domainLookupEnd - r.domainLookupStart; tcp = r.connectEnd - r.connectStart; // includes ssl negotiation waiting = r.responseStart - r.requestStart; content = r.responseEnd - r.responseStart; networkDuration = dns + tcp + waiting + content; Developers working with same-origin resources can do the same calculations as shown above to derive “networkDuration”. However, providing the result as a new attribute simplifies the process. It also avoids possible errors as it’s likely that companies and teams will compare these values, so it’s important to ensure an apples-to-apples comparison. But the primary need for “networkDuration” is for cross-origin resources. Right now, “duration” is the only metric available for cross-origin resources. I’ve found several teams that were tracking “duration” assuming it meant download time. They were surprised when I explained that it also including blocking time, and agreed it was not the metric they wanted; instead they wanted the equivalent of “networkDuration”. I mentioned previously that the detailed time values (domainLookupStart, connectStart, etc.) are restricted to same-origin resources for privacy reasons. The proposal to add “networkDuration” is likely to raise privacy concerns; specifically that by removing blocking time, “networkDuration” would enable malicious third party JavaScript to determine whether a resource was read from cache. However, it’s possible to remove blocking time today using “duration” by loading a resource when there’s no blocking contention (e.g., after window.onload). Even when blocking time is removed, it’s ambiguous whether a resource was read from cache or loaded over the network. Even a cache read will have non-zero load times. The problem that “networkDuration” solves is finding the load time for more typical resources that are loaded during page creation and might therefore incur blocking time. Takeaway It’s not possible today to use Resource Timing to measure load time for cross-origin resources. Companies that want to measure load time and blocking time can use “duration”, but all the companies I’ve spoken with want to measure the actual load time (without blocking time). To provide better performance metrics, I encourage the addition of “networkDuration” to the Resource Timing specification. If you agree, please voice your support in a reply to my “networkDuration” proposal on the W3C Web Performance mailing list.