Front-End Performance 2021: Build Optimizations

About The Author

Vitaly Friedman loves beautiful content and doesn’t like to give in easily. When he is not writing, he’s most probably running front-end & UX … More about Vitaly ↬

Email Newsletter

Weekly tips on front-end & UX.
Trusted by 200,000+ folks.

Let’s make 2021… fast! An annual front-end performance checklist with everything you need to know to create fast experiences on the web today, from metrics to tooling and front-end techniques. Updated since 2016.

Table Of Contents

  1. Getting Ready: Planning And Metrics
  2. Setting Realistic Goals
  3. Defining The Environment
  4. Assets Optimizations
  5. Build Optimizations
  6. Delivery Optimizations
  7. Networking, HTTP/2, HTTP/3
  8. Testing And Monitoring
  9. Quick Wins
  10. Everything on one page
  11. Download The Checklist (PDF, Apple Pages, MS Word)
  12. Subscribe to our email newsletter to not miss the next guides.

Build Optimizations

  1. Have we defined our priorities?
    It’s a good idea to know what you are dealing with first. Run an inventory of all of your assets (JavaScript, images, fonts, third-party scripts and "expensive" modules on the page, such as carousels, complex infographics and multimedia content), and break them down in groups.

    Set up a spreadsheet. Define the basic core experience for legacy browsers (i.e. fully accessible core content), the enhanced experience for capable browsers (i.e. an enriched, full experience) and the extras (assets that aren’t absolutely required and can be lazy-loaded, such as web fonts, unnecessary styles, carousel scripts, video players, social media widgets, large images). Years ago, we published an article on "Improving Smashing Magazine’s Performance," which describes this approach in detail.

    When optimizing for performance we need to reflect our priorities. Load the core experience immediately, then enhancements, and then the extras.

  2. Do you use native JavaScript modules in production?
    Remember the good ol' cutting-the-mustard technique to send the core experience to legacy browsers and an enhanced experience to modern browsers? An updated variant of the technique could use ES2017+ <script type="module">, also known as module/nomodule pattern (also introduced by Jeremy Wagner as differential serving).

    The idea is to compile and serve two separate JavaScript bundles: the “regular” build, the one with Babel-transforms and polyfills and serve them only to legacy browsers that actually need them, and another bundle (same functionality) that has no transforms or polyfills.

    As a result, we help reduce blocking of the main thread by reducing the amount of scripts the browser needs to process. Jeremy Wagner has published a comprehensive article on differential serving and how to set it up in your build pipeline, from setting up Babel, to what tweaks you’ll need to make in Webpack, as well as the benefits of doing all this work.

    Native JavaScript module scripts are deferred by default, so while HTML parsing is happening, the browser will download the main module.

    An example showing how native JavaScript modules are deferred by default
    Native JavaScript modules are deferred by default. Pretty much everything about native JavaScript modules. (Large preview)

    One note of warning though: the module/nomodule pattern can backfire on some clients, so you might want to consider a workaround: Jeremy’s less risky differential serving pattern which, however, sidesteps the preload scanner, which could affect performance in ways one might not anticipate. (thanks, Jeremy!)

    In fact, Rollup supports modules as an output format, so we can both bundle code and deploy modules in production. Parcel has module support in Parcel 2. For Webpack, module-nomodule-plugin automates the generation of module/nomodule scripts.

    Note: It’s worth stating that feature detection alone isn’t enough to make an informed decision about the payload to ship to that browser. On its own, we can’t deduce device capability from browser version. For example, cheap Android phones in developing countries mostly run Chrome and will cut the mustard despite their limited memory and CPU capabilities.

    Eventually, using the Device Memory Client Hints Header, we’ll be able to target low-end devices more reliably. At the moment of writing, the header is supported only in Blink (it goes for client hints in general). Since Device Memory also has a JavaScript API which is available in Chrome, one option could be to feature detect based on the API, and fall back to module/nomodule pattern if it’s not supported (thanks, Yoav!).

  3. Are you using tree-shaking, scope hoisting and code-splitting?
    Tree-shaking is a way to clean up your build process by only including code that is actually used in production and eliminate unused imports in Webpack. With Webpack and Rollup, we also have scope hoisting that allows both tools to detect where import chaining can be flattened and converted into one inlined function without compromising the code. With Webpack, we can also use JSON Tree Shaking as well.

    Code-splitting is another Webpack feature that splits your codebase into "chunks" that are loaded on demand. Not all of the JavaScript has to be downloaded, parsed and compiled right away. Once you define split points in your code, Webpack can take care of the dependencies and outputted files. It enables you to keep the initial download small and to request code on demand when requested by the application. Alexander Kondrov has a fantastic introduction to code-splitting with Webpack and React.

    Consider using preload-webpack-plugin that takes routes you code-split and then prompts browser to preload them using <link rel="preload"> or <link rel="prefetch">. Webpack inline directives also give some control over preload/prefetch. (Watch out for prioritization issues though.)

    Where to define split points? By tracking which chunks of CSS/JavaScript are used, and which aren’t used. Umar Hansa explains how you can use Code Coverage from Devtools to achieve it.

    When dealing with single-page applications, we need some time to initialize the app before we can render the page. Your setting will require your custom solution, but you could watch out for modules and techniques to speed up the initial rendering time. For example, here’s how to debug React performance and eliminate common React performance issues, and here’s how to improve performance in Angular. In general, most performance issues come from the initial time to bootstrap the app.

    So, what’s the best way to code-split aggressively, but not too aggressively? According to Phil Walton, "in addition to code-splitting via dynamic imports, [we could] also use code-splitting at the package level, where each imported node modules get put into a chunk based on its package’s name." Phil provides a tutorial on how to build it as well.

  4. Can we improve Webpack's output?
    As Webpack is often considered to be mysterious, there are plenty of Webpack plugins that may come in handy to further reduce Webpack's output. Below are some of the more obscure ones that might need a bit more attention.

    One of the interesting ones comes from Ivan Akulov's thread. Imagine that you have a function that you call once, store its result in a variable, and then don’t use that variable. Tree-shaking will remove the variable, but not the function, because it might be used otherwise. However, if the function isn't used anywhere, you might want to remove it. To do so, prepend the function call with /*#__PURE__*/ which is supported by Uglify and Terser — done!

    A screenshot of JS code in an editor showing how the PURE function can be used
    To remove such a function when its result is not used, prepend the function call with /*#__PURE__*/. Via Ivan Akulov.(Large preview)

    Here are some of the other tools that Ivan recommends:

A screenshot of a terminal showing how the webpack loader named responsive-loader can be used to help you generate responsive images out of the box
Speed up your images is to serve smaller pictures on smaller screens. With responsive-loader. Via Ivan Akulov. (Large preview)
  1. Can you offload JavaScript into a Web Worker?
    To reduce the negative impact to Time-to-Interactive, it might be a good idea to look into offloading heavy JavaScript into a Web Worker.

    As the code base keeps growing, the UI performance bottlenecks will show up, slowing down the user’s experience. That’s because DOM operations are running alongside your JavaScript on the main thread. With web workers, we can move these expensive operations to a background process that’s running on a different thread. Typical use cases for web workers are prefetching data and Progressive Web Apps to load and store some data in advance so that you can use it later when needed. And you could use Comlink to streamline the communication between the main page and the worker. Still some work to do, but we are getting there.

    There are a few interesting case studies around web workers which show different approaches of moving framework and app logic to web workers. The conclusion: in general, there are still some challenges, but there are some good use cases already (thanks, Ivan Akulov!).

    Starting from Chrome 80, a new mode for web workers with performance benefits of JavaScript modules has been shipped, called module workers. We can change script loading and execution to match script type="module", plus we can also use dynamic imports for lazy-loading code without blocking execution of the worker.

    How to get started? Here are a few resources that are worth looking into:

    Code in DOM shown on the left as an example of what to use and avoid when using web workers
    Use web workers when code blocks for a long time, but avoid them when you rely on the DOM, handle input response and need minimal delay. (via Addy Osmani) (Large preview)

    Note that Web Workers don’t have access to the DOM because the DOM is not "thread-safe", and the code that they execute needs to be contained in a separate file.

  2. Can you offload "hot paths" to WebAssembly?
    We could offload computationally heavy tasks off to WebAssembly (WASM), a binary instruction format, designed as a portable target for compilation of high-level languages like C/C++/Rust. Its browser support is remarkable, and it has recently become viable as function calls between JavaScript and WASM are getting faster. Plus, it’s even supported on Fastly’s edge cloud.

    Of course, WebAssembly isn’t supposed to replace JavaScript, but it can complement it in cases when you notice CPU hogs. For most web apps, JavaScript is a better fit, and WebAssembly is best used for computationally intensive web apps, such as games.

    If you’d like to learn more about WebAssembly:

    Still not sure about when to use Web Workers, Web Assembly, streams, or perhaps WebGL JavaScript API to access the GPU? Accelerating JavaScript is a short but helpful guide that explains when to use what, and why — also with a handy flowchart and plenty of useful resources.

An illustration of C++, C or Rust shown on the left with an arrow showing to a browser that includes WASM binaries adding to the JavaScript, CSS and HTML
Milica Mihajlija provides a general overview of how WebAssembly works and why it’s useful. (Large preview)
  1. Do we serve legacy code only to legacy browsers?
    With ES2017 being remarkably well supported in modern browsers, we can use babelEsmPlugin to only transpile ES2017+ features unsupported by the modern browsers you are targeting.

    Houssein Djirdeh and Jason Miller have recently published a comprehensive guide on how to transpile and serve modern and legacy JavaScript, going into details of making it work with Webpack and Rollup, and the tooling needed. You can also estimate how much JavaScript you can shave off on your site or app bundles.

    JavaScript modules are supported in all major browsers, so use use script type="module" to let browsers with ES module support load the file, while older browsers could load legacy builds with script nomodule.

    These days we can write module-based JavaScript that runs natively in the browser, without transpilers or bundlers. <link rel="modulepreload"> header provides a way to initiate early (and high-priority) loading of module scripts. Basically, it’s a nifty way to help in maximizing bandwidth usage, by telling the browser about what it needs to fetch so that it’s not stuck with anything to do during those long roundtrips. Also, Jake Archibald has published a detailed article with gotchas and things to keep in mind with ES Modules that’s worth reading.

Inline scripts are deferred until blocking external scripts and inline scripts are executed
Jake Archibald has published a detailed article with gotchas and things to keep in mind with ES Modules, e.g. inline scripts are deferred until blocking external scripts and inline scripts are executed. (Large preview)
  1. Identify and rewrite legacy code with incremental decoupling.
    Long-living projects have a tendency to gather dust and dated code. Revisit your dependencies and assess how much time would be required to refactor or rewrite legacy code that has been causing trouble lately. Of course, it’s always a big undertaking, but once you know the impact of the legacy code, you could start with incremental decoupling.

    First, set up metrics that tracks if the ratio of legacy code calls is staying constant or going down, not up. Publicly discourage the team from using the library and make sure that your CI alerts developers if it’s used in pull requests. polyfills could help transition from legacy code to rewritten codebase that uses standard browser features.

  2. Identify and remove unused CSS/JS.
    CSS and JavaScript code coverage in Chrome allows you to learn which code has been executed/applied and which hasn't. You can start recording the coverage, perform actions on a page, and then explore the code coverage results. Once you’ve detected unused code, find those modules and lazy load with import() (see the entire thread). Then repeat the coverage profile and validate that it’s now shipping less code on initial load.

    You can use Puppeteer to programmatically collect code coverage. Chrome allows you to export code coverage results, too. As Andy Davies noted, you might want to collect code coverage for both modern and legacy browsers though.

    There are many other use-cases and tools for Puppetter that might need a bit more exposure:

    A Screenshot of the Pupeteer Recorder on the left, and a screenshot Puppeteer Sandbox shown on the right
    We can use Puppeteer Recorder and Puppeteer Sandbox to record browser interaction and generate Puppeteer and Playwright scripts. (Large preview)

    Furthermore, purgecss, UnCSS and Helium can help you remove unused styles from CSS. And if you aren’t certain if a suspicious piece of code is used somewhere, you can follow Harry Roberts' advice: create a 1×1px transparent GIF for a particular class and drop it into a dead/ directory, e.g. /assets/img/dead/comments.gif.

    After that, you set that specific image as a background on the corresponding selector in your CSS, sit back and wait for a few months if the file is going to appear in your logs. If there are no entries, nobody had that legacy component rendered on their screen: you can probably go ahead and delete it all.

    For the I-feel-adventurous-department, you could even automate gathering on unused CSS through a set of pages by monitoring DevTools using DevTools.

Webpack comparison table
In his article, Benedikt Rötsch’s showed that a switch from Moment.js to date-fns could shave around 300ms for First paint on 3G and a low-end mobile phone. (Large preview)
  1. Trim the size of your JavaScript bundles.
    As Addy Osmani noted, there’s a high chance you’re shipping full JavaScript libraries when you only need a fraction, along with dated polyfills for browsers that don’t need them, or just duplicate code. To avoid the overhead, consider using webpack-libs-optimizations that removes unused methods and polyfills during the build process.

    Check and review the polyfills that you are sending to legacy browsers and to modern browsers, and be more strategic about them. Take a look at polyfill.io which is a service that accepts a request for a set of browser features and returns only the polyfills that are needed by the requesting browser.

    Add bundle auditing into your regular workflow as well. There might be some lightweight alternatives to heavy libraries you’ve added years ago, e.g. Moment.js (now discontinued) could be replaced with:

    Benedikt Rötsch’s research showed that a switch from Moment.js to date-fns could shave around 300ms for First paint on 3G and a low-end mobile phone.

    For bundle auditing, Bundlephobia could help find the cost of adding a npm package to your bundle. size-limit extends basic bundle size check with details on JavaScript execution time. You can even integrate these costs with a Lighthouse Custom Audit. This goes for frameworks, too. By removing or trimming the Vue MDC Adapter (Material Components for Vue), styles drop from 194KB to 10KB.

    There are many further tools to help you make an informed decision about the impact of your dependencies and viable alternatives:

    Alternatively to shipping the entire framework, you could trim your framework and compile it into a raw JavaScript bundle that does not require additional code. Svelte does it, and so does Rawact Babel plugin which transpiles React.js components to native DOM operations at build-time. Why? Well, as maintainers explain, "react-dom includes code for every possible component/HTMLElement that can be rendered, including code for incremental rendering, scheduling, event handling, etc. But there are applications that do not need all these features (at initial page load). For such applications, it might make sense to use native DOM operations to build the interactive user interface."

size-limit provides basic bundle size check with details on JavaScript execution time as well
size-limit provides basic bundle size check with details on JavaScript execution time as well. (Large preview)
  1. Do we use partial hydration?
    With the amount of JavaScript used in applications, we need to figure out ways to send as little as possible to the client. One way of doing so — and we briefly covered it already — is with partial hydration. The idea is quite simple: instead of doing SSR and then sending the entire app to the client, only small pieces of the app's JavaScript would be sent to the client and then hydrated. We can think of it as multiple tiny React apps with multiple render roots on an otherwise static website.

    In the article "The case of partial hydration (with Next and Preact)", Lukas Bombach explains how the team behind Welt.de, one of the news outlets in Germany, has achieved better performance with partial hydration. You can also check next-super-performance GitHub repo with explanations and code snippets.

    You could also consider alternative options:

    Jason Miller has published working demos on how progressive hydration could be implemented with React, so you can use them right away: demo 1, demo 2, demo 3 (also available on GitHub). Plus, you can look into the react-prerendered-component library.

    +485KB of JavaScript upon loadshare() in Google Docs
    Import-on-interaction for first-party code should only be done if you’re unable to prefetch resources prior to interaction. (Large preview)
  2. Have we optimized the strategy for React/SPA?
    Struggling with performance in your single-page-application application? Jeremy Wagner has explored the impact of client-side framework performance on a variety of devices, highlighting some of the implications and guidelines we might want to be aware of when using one.

    As a result, here's a SPA strategy that Jeremy suggests to use for React framework (but it shouldn't change significantly for other frameworks):

    • Refactor stateful components as stateless components whenever possible.
    • Prerender stateless components when possible to minimize server response time. Render only on the server.
    • For stateful components with simple interactivity, consider prerendering or server-rendering that component, and replace its interactivity with framework-independent event listeners.
    • If you must hydrate stateful components on the client, use lazy hydration on visibility or interaction.
    • For lazily-hydrated components, schedule their hydration during main thread idle time with requestIdleCallback.

    There are a few other strategies you might want to pursue or review:

  3. Are you using predictive prefetching for JavaScript chunks?
    We could use heuristics to decide when to preload JavaScript chunks. Guess.js is a set of tools and libraries that use Google Analytics data to determine which page a user is most likely to visit next from a given page. Based on user navigation patterns collected from Google Analytics or other sources, Guess.js builds a machine-learning model to predict and prefetch JavaScript that will be required on each subsequent page.

    Hence, every interactive element is receiving a probability score for engagement, and based on that score, a client-side script decides to prefetch a resource ahead of time. You can integrate the technique to your Next.js application, Angular and React, and there is a Webpack plugin which automates the setup process as well.

    Obviously, you might be prompting the browser to consume unneeded data and prefetch undesirable pages, so it’s a good idea to be quite conservative in the number of prefetched requests. A good use case would be prefetching validation scripts required in the checkout, or speculative prefetch when a critical call-to-action comes into the viewport.

    Need something less sophisticated? DNStradamus does DNS prefetching for outbound links as they appear in the viewport. Quicklink, InstantClick and Instant.page are small libraries that automatically prefetch links in the viewport during idle time in attempt to make next-page navigations load faster. Quicklink allows to prefetch React Router routes and Javascript; plus it’s data-considerate, so it doesn’t prefetch on 2G or if Data-Saver is on. So is Instant.page if the mode is set to use viewport prefetching (which is a default).

    If you want to look into the science of predictive prefetching in full detail, Divya Tagtachian has a great talk on The Art of Predictive Prefetch, covering all the options from start to finish.

  4. Take advantage of optimizations for your target JavaScript engine.
    Study what JavaScript engines dominate in your user base, then explore ways of optimizing for them. For example, when optimizing for V8 which is used in Blink-browsers, Node.js runtime and Electron, make use of script streaming for monolithic scripts.

    Script streaming allows async or defer scripts to be parsed on a separate background thread once downloading begins, hence in some cases improving page loading times by up to 10%. Practically, use <script defer> in the <head>, so that the browsers can discover the resource early and then parse it on the background thread.

    Caveat: Opera Mini doesn’t support script deferment, so if you are developing for India or Africa, defer will be ignored, resulting in blocking rendering until the script has been evaluated (thanks Jeremy!).

    You could also hook into V8’s code caching as well, by splitting out libraries from code using them, or the other way around, merge libraries and their uses into a single script, group small files together and avoid inline scripts. Or perhaps even use v8-compile-cache.

    When it comes to JavaScript in general, there are also some practices that are worth keeping in mind:

An illustration to help you understand time loading and responsiveness
A blue banner showing running JS with white lines in regular gaps representing the time when we proactively check whether there’s user input without incurring the overhead of yielding execution to the browser and back
isInputPending() is a new browser API that attempts to bridge the gap between loading and responsiveness.(Large preview)
An illustration of a map showing the chain of each request to different domains, all the way to eighth-party scripts
The request map for CNN.com showing the chain of each request to different domains, all the way to eighth-party scripts. Source. (Large preview)
  1. Always prefer to self-host third-party assets.
    Yet again, self-host your static assets by default. It’s common to assume that if many sites use the same public CDN and the same version of a JavaScript library or a web font, then the visitors would land on our site with the scripts and fonts already cached in their browser, speeding up their experience considerably. However, it’s very unlikely to happen.

    For security reasons, to avoid fingerprinting, browsers have been implementing partitioned caching that was introduced in Safari back in 2013, and in Chrome last year. So if two sites point to the exact same third-party resource URL, the code is downloaded once per domain, and the cache is "sandboxed" to that domain due to privacy implications (thanks, David Calhoun!). Hence, using a public CDN will not automatically lead to better performance.

    Furthermore, it’s worth noting that resources don’t live in the browser’s cache as long as we might expect, and first-party assets are more likely to stay in the cache than third-party assets. Therefore, self-hosting is usually more reliable and secure, and better for performance, too.

  2. Constrain the impact of third-party scripts.
    With all performance optimizations in place, often we can’t control third-party scripts coming from business requirements. Third-party-scripts metrics aren’t influenced by end-user experience, so too often one single script ends up calling a long tail of obnoxious third-party scripts, hence ruining a dedicated performance effort. To contain and mitigate performance penalties that these scripts bring along, it’s not enough to just defer their loading and execution and warm up connections via resource hints, i.e. dns-prefetch or preconnect.

    Currently 57% of all JavaScript code excution time is spent on third-party code. The median mobile site accesses 12 third-party domains, with a median of 37 different requests (or about 3 requests made to each third party).

    Furthermore, these third-parties often invite fourth-party scripts to join in, ending up with a huge performance bottleneck, sometimes going as far as to the eigth-party scripts on a page. So regularly auditing your dependencies and tag managers can bring along costly surprises.

    Another problem, as Yoav Weiss explained in his talk on third-party scripts, is that in many cases these scripts download resources that are dynamic. The resources change between page loads, so we don’t necessarily know which hosts the resources will be downloaded from and what resources they would be.

    Deferring, as shown above, might be just a start though as third-party scripts also steal bandwidth and CPU time from your app. We could be a bit more aggressive and load them only when our app has initialized.

    /* Before */
    const App = () => {
      return <div>
        <script>
          window.dataLayer = window.dataLayer || [];
          function gtag(){...}
          gtg('js', new Date());
        </script>
      </div>
    }
    
    /* After */
    const App = () => {
      const[isRendered, setRendered] = useState(false);
    
      useEffect(() => setRendered(true));
    
      return <div>
      {isRendered ?
        <script>
          window.dataLayer = window.dataLayer || [];
          function gtag(){...}
          gtg('js', new Date());
        </script>
      : null}
      </div>
    }
    

    In a fantastic post on "Reducing the Site-Speed Impact of Third-Party Tags", Andy Davies explores a strategy of minimizing the footprint of third-parties — from identifying their costs towards reducing their impact.

    According to Andy, there are two ways tags impact site-speed — they compete for network bandwidth and processing time on visitors’ devices, and depending on how they’re implemented, they can delay HTML parsing as well. So the first step is to identify the impact that third-parties have, by testing the site with and without scripts using WebPageTest. With Simon Hearne’s Request Map, we can also visualize third-parties on a page along with details on their size, type and what triggered their load.

    Preferably self-host and use a single hostname, but also use a request map to exposes fourth-party calls and detect when the scripts change. You can use Harry Roberts' approach for auditing third parties and produce spreadsheets like this one (also check Harry's auditing workflow).

    Afterwards, we can explore lightweight alternatives to existing scripts and slowly replace duplicates and main culprits with lighter options. Perhaps some of the scripts could be replaced with their fallback tracking pixel instead of the full tag.

    Left example showing 3KB of JavaScript using the lite-youtube custom element, middle and right example showing +540KB of JavaScript with the lite-youtube custom element
    Loading YouTube with facades, e.g. lite-youtube-embed that’s significantly smaller than an actual YouTube player. (Image source) (Large preview)

    If it’s not viable, we can at least lazy load third-party resources with facades, i.e. a static element which looks similar to the actual embedded third-party, but is not functional and therefore much less taxing on the page load. The trick, then, is to load the actual embed only on interaction.

    For example, we can use:

    One of the reasons why tag managers are usually large in size is because of the many simultaneous experiments that are running at the same time, along with many user segments, page URLs, sites etc., so according to Andy, reducing them can reduce both the download size and the time it takes to execute the script in the browser.

    And then there are anti-flicker snippets. Third-parties such as Google Optimize, Visual Web Optimizer (VWO) and others are unanimous in using them. These snippets are usually injected along with running A/B tests: to avoid flickering between the different test scenarios, they hide the body of the document with opacity: 0, then adds a function that gets called after a few seconds to bring the opacity back. This often results in massive delays in rendering due to massive client-side execution costs.

    Seven previews shown from 0.0 seconds to 6.0 seconds showing how and when contents are hidden by the anti-flicker snippet when a visitor initiates navigation
    With A/B testing in use, customers would often see flickering like this one. Anti-Flicker snippets prevent that, but they also cost in performance. Via Andy Davies. (Large preview)

    Therefore keep track how often the anti-flicker timeout is triggered and reduce the timeout. Default blocks display of your page by up to 4s which will ruin conversion rates. According to Tim Kadlec, "Friends don’t let friends do client side A/B testing". Server-side A/B testing on CDNs (e.g. Edge Computing, or Edge Slice Rerendering) is always a more performant option.

    If you have to deal with almighty Google Tag Manager, Barry Pollard provides some guidelines to contain the impact of Google Tag Manager. Also, Christian Schaefer explores strategies for loading ads.

    Watch out: some third-party widgets hide themselves from auditing tools, so they might be more difficult to spot and measure. To stress-test third parties, examine bottom-up summaries in Performance profile page in DevTools, test what happens if a request is blocked or it has timed out — for the latter, you can use WebPageTest’s Blackhole server blackhole.webpagetest.org that you can point specific domains to in your hosts file.

    What options do we have then? Consider using service workers by racing the resource download with a timeout and if the resource hasn’t responded within a certain timeout, return an empty response to tell the browser to carry on with parsing of the page. You can also log or block third-party requests that aren’t successful or don’t fulfill certain criteria. If you can, load the 3rd-party-script from your own server rather than from the vendor’s server and lazy load them.

    Another option is to establish a Content Security Policy (CSP) to restrict the impact of third-party scripts, e.g. disallowing the download of audio or video. The best option is to embed scripts via <iframe> so that the scripts are running in the context of the iframe and hence don’t have access to the DOM of the page, and can’t run arbitrary code on your domain. Iframes can be further constrained using the sandbox attribute, so you can disable any functionality that iframe may do, e.g. prevent scripts from running, prevent alerts, form submission, plugins, access to the top navigation, and so on.

    You could also keep third-parties in check via in-browser performance linting with feature policies, a relatively new feature that lets you opt-in or out of certain browser features on your site. (As a sidenote, it could also be used to avoid oversized and unoptimized images, unsized media, sync scripts and others). Currently supported in Blink-based browsers.

    /* Via Tim Kadlec. https://timkadlec.com/remembers/2020-02-20-in-browser-performance-linting-with-feature-policies/ */
    /* Block the use of the Geolocation API with a Feature-Policy header. */
    Feature-Policy: geolocation 'none'
    

    As many third-party scripts are running in iframes, you probably need to be thorough in restricting their allowances. Sandboxed iframes are always a good idea, and each of the limitations can be lifted via a number of allow values on the sandbox attribute. Sandboxing is supported almost everywhere, so constrain third-party scripts to the bare minimum of what they should be allowed to do.

    A screenshot of the ThirdPartyWeb.Today website visualizing how long the entity’s scripts take to execute on average
    ThirdPartyWeb.Today groups all third-party scripts by category (analytics, social, advertising, hosting, tag manager etc.) and visualizes how long the entity’s scripts take to execute (on average). (Large preview)

    Consider using an Intersection Observer; that would enable ads to be iframed while still dispatching events or getting the information that they need from the DOM (e.g. ad visibility). Watch out for new policies such as Feature policy, resource size limits and CPU/Bandwidth priority to limit harmful web features and scripts that would slow down the browser, e.g. synchronous scripts, synchronous XHR requests, document.write and outdated implementations.

    Finally, when choosing a third-party service, consider checking Patrick Hulce's ThirdPartyWeb.Today, a service that groups all third-party scripts by category (analytics, social, advertising, hosting, tag manager etc.) and visualizes how long the entity’s scripts take to execute (on average). Obviously, largest entities have the worst performance impact to the pages they’re on. Just by skimming the page, you’ll get an idea of the performance footprint you should be expecting.

    Ah, and don't forget about the usual suspects: instead of third-party widgets for sharing, we can use static social sharing buttons (such as by SSBG) and static links to interactive maps instead of interactive maps.

A graph example comparing the percentage of requests of first- and third parties: 399KB amounting to 27% of requests for first party, and 1.15MB amounting to 73% of requests for third-party
Casper.com published a detailed case study on how they managed to shave 1.7 seconds off the site by self-hosting Optimizely. It might be worth it. (Image source) (Large preview)
  1. Set HTTP cache headers properly.
    Caching seems to be such an obvious thing to do, yet it might be quite tough to get right. We need to double-check that expires, max-age, cache-control, and other HTTP cache headers have been set properly. Without proper HTTP cache headers, browsers will set them automatically at 10% of elapsed time since last-modified, ending up with potential under- and over-caching.

    In general, resources should be cacheable either for a very short time (if they are likely to change) or indefinitely (if they are static) — you can just change their version in the URL when needed. You can call it a Cache-Forever strategy, in which we could relay Cache-Control and Expires headers to the browser to only allow assets to expire in a year. Hence, the browser wouldn’t even make a request for the asset if it has it in the cache.

    The exception are API responses (e.g. /api/user). To prevent caching, we can use private, no store, and not max-age=0, no-store:

    Cache-Control: private, no-store

    Use Cache-control: immutable to avoid revalidation of long explicit cache lifetimes when users hit the reload button. For the reload case, immutable saves HTTP requests and improves the load time of the dynamic HTML as they no longer compete with the multitude of 304 responses.

    A typical example where we want to use immutable are CSS/JavaScript assets with a hash in their name. For them, we probably want to cache as long as possible, and ensure they never get re-validated:

    Cache-Control: max-age: 31556952, immutable

    According to Colin Bendell’s research, immutable reduces 304 redirects by around 50% as even with max-age in use, clients still re-validate and block upon refresh. It’s supported in Firefox, Edge and Safari and Chrome is still debating the issue.

    According to Web Almanac, "its usage has grown to 3.5%, and it’s widely used in Facebook and Google third-party responses."

    Effectiveness of Cache Control across continents with data retrieved from Android Chrome and iOS Safari
    Cache-Control: Immutable reduces 304s by around 50%, according to Colin Bendell’s research at Cloudinary. (Large preview)

    Do you remember the good ol' stale-while-revalidate? When we specify the caching time with the Cache-Control response header (e.g. Cache-Control: max-age=604800), after max-age expires, the browser will re-fetch the requested content, causing the page to load slower. This slowdown can be avoided with stale-while-revalidate; it basically defines an extra window of time during which a cache can use a stale asset as long as it revalidates it async in the background. Thus, it "hides" latency (both in the network and on the server) from clients.

    In June–July 2019, Chrome and Firefox launched support of stale-while-revalidate in HTTP Cache-Control header, so as a result, it should improve subsequent page load latencies as stale assets are no longer in the critical path. Result: zero RTT for repeat views.

    Be wary of the vary header, especially in relation to CDNs, and watch out for the HTTP Representation Variants which help avoiding an additional round trip for validation whenever a new request differs slightly (but not significantly) from prior requests (thanks, Guy and Mark!).

    Also, double-check that you aren’t sending unnecessary headers (e.g. x-powered-by, pragma, x-ua-compatible, expires, X-XSS-Protection and others) and that you include useful security and performance headers (such as Content-Security-Policy, X-Content-Type-Options and others). Finally, keep in mind the performance cost of CORS requests in single-page applications.

    Note: We often assume that cached assets are retrieved instantly, but research shows that retrieving an object from cache can take hundreds of milliseconds. In fact, according to Simon Hearne, "sometimes network might be faster than cache, and retrieving assets from cache can be costly with a large number of cached assets (not file size) and the user’s devices. For example: Chrome OS average cache retrieval doubles from ~50ms with 5 cached resources up to ~100ms with 25 resources".

    Besides, we often assume that bundle size isn’t a huge issue and users will download it once and then use the cached version. At the same time, with CI/CD we push code to production multiple times a day, cache gets invalidated every time, so being strategic about caching matters.

    When it comes to caching, there are plenty of resources that are worth reading:

A graph showing cache retrieval time by count of cached assets with different OS and browsers named on the right (from top to bottom): Desktop Chrome OS, Tablet Android OS, Mobile Android OS, Desktop Mac OS X, Desktop Windows, Desktop Linux
We assume that browser caches are near-instantaneous, but data shows that retrieving an object from cache can take hundreds of milliseconds! From Simon Hearne’s research on When Network Is Faster Than Cache. (Large preview)

Table Of Contents

  1. Getting Ready: Planning And Metrics
  2. Setting Realistic Goals
  3. Defining The Environment
  4. Assets Optimizations
  5. Build Optimizations
  6. Delivery Optimizations
  7. Networking, HTTP/2, HTTP/3
  8. Testing And Monitoring
  9. Quick Wins
  10. Everything on one page
  11. Download The Checklist (PDF, Apple Pages, MS Word)
  12. Subscribe to our email newsletter to not miss the next guides.
Smashing Editorial (il)