Bust That Cache Through A Content Hash

During a recent discussion with fellow developers, we talked about how to get around caching issues with static assets and shared our approaches.

Every web developer knows the issue. You want to have your static assets be cached by the visitor’s browser with a very long expiry date. This accelerates the site on the visitor’s site, reduces the strain and bandwidth usage on your server, and gives you good PageSpeed scores ( = much SEO! ).

However, when you start making changes to your site, you have to force a hard refresh in your browser (or even worse, reset your static caching subsystem) to get your browser to actually download that changed version of the asset.

That’s why you want to include a “cache busting” system, that suggests to the browser that, when you made a change in your static asset, that new file is actually different and should not be retrieved from the cache, but freshly downloaded.

There are different approaches to achieve this, and I’ll briefly touch upon the most common one before describing the method I use on my sites.

Using Query Strings

The most common approach within the WordPress sphere is to add a query string with the version or with the file modification time to the URL of the asset in question. Using the version in a query string is also what WordPress does out of the box to help you get around caches.

Default WordPress behaviour is to append the version as a query string.

Default WordPress behaviour is to append the version as a query string.

The problem with this approach is that depending on what servers the visitor’s connection is passing through, the assets will not be cached at all. The query string appended to the end of the URL makes some proxy servers assume that the content of that file is dynamic in nature, and dependent on the current value of the query string.

An example of why this is a reasonable assumption is the case of your search box. If the visitor enters something into that box and submits this search query, the URL will have a query string with the actual search terms appended to it: https://example.com/?s=my+search+terms . You would not want to have one cached search result page be shared amongst all your visitor’s, right?

This is also the reason why some page speed analysis tools will flag your assets if they contain such a query string.

Both of the information bits suggested as values within the query string above also come with their specific disadvantages as well.

Using the version string means that you’ll have to remember to manually bump the version every time you make a change. This is boring and error-prone work, and will probably lead to instances where you forgot to bump the version.

Using the file modification time seems to avoid this, but comes with a different disadvantage. If you are using a task runner with an automated build pipeline, chances are that your asset will get completely rebuilt each time your task runner launches a new build. This means that all of your file modification times will change with every single change, even if that change did not change any of the assets at all. This is wasteful, as the caches get invalidated much more often than is needed.

Also, using the file modification time means that every page request will make a file access on every static asset to collect these modification times. While these accesses are probably cached by the underlying filesystem, it might nevertheless have a negative performance impact.

A Better Approach

To improve upon the commonly used methods, the requirements are:

  1. No query strings, only a static URL.
  2. Cache is only busted if this is actually needed.

To meet the first requirement, we put whatever information we need to bust the cache inside of the filename itself. Easy, right?

Well, turns out, this is not that easy at all. Having a changing component within the filename makes it actually much more difficult to determine one of the data elements you need to generate the enqueueing URL: …the exact filename!

This is why we will need some kind of mapping information that lets us deduce the current dynamically generated filename from a static filename or identifier.

To meet the second requirement, we make the information we use to bust the cache depend on the actual contents of the file. So, even if the file gets rebuilt through our task runner, it will not cause a cache bust until there was an actual change for that specific file.

The usual way of achieving this is to generate a hash of the file contents. For files with the same content, the generated hash will always be the same.

My Current Implementation

My static assets are run through a gulp pipeline. gulp is a task runner, and it allows me to automate my entire build process. It can also watch my files, and relaunch the build process every time I save a change to one of the watched files.

JavaScript files, for example, are all split up into individual modules. They are run through browserify, where all the modules are concatenated, all dependencies pulled in, and everything is optimized and minimized. This results in one single JavaScript file that needs to be enqueued at the frontend.

At the end of the gulp tasks for each of the static asset types, they are run through gulp-rev (https://github.com/sindresorhus/gulp-rev). gulp-rev reads the content and builds a 10-char hash out of it, and appends this to the filename: unicorn.cssunicorn-d41d8cd98f.css.

It will also create a rev-manifest.json file, that contains the mappings from the original name to the cache-busting name.

I reconsolidate the rev-manifest.json files from each of the asset type pipelines and save this file into the root of my assets folder.

Here’s a real example of what such a rev-manifest.json file will look like (in use on one of my sites):

{
  "js/core.js": "js/core-7ab954de.js",
  "styles/atomic-editor.css": "styles/atomic-editor-ff7ef446.min.css",
  "styles/atomic.css": "styles/atomic-8f3b28e1.min.css",
  "js/frontend.js": "js/frontend-ad0cbc15.js",
  "js/backend.js": "js/backend-9b0842cd.js",
  "images/apple-touch-icon-114x114.png": "images/apple-touch-icon-114x114-13efabdd.png",
  "images/apple-touch-icon-120x120.png": "images/apple-touch-icon-120x120-bc7f8566.png",
  "images/apple-touch-icon-144x144.png": "images/apple-touch-icon-144x144-97485e89.png",
  "images/apple-touch-icon-152x152.png": "images/apple-touch-icon-152x152-98ce2b7e.png",
  "images/apple-touch-icon-180x180.png": "images/apple-touch-icon-180x180-0bab6263.png",
  "images/apple-touch-icon-57x57.png": "images/apple-touch-icon-57x57-52911982.png",
  "images/apple-touch-icon-60x60.png": "images/apple-touch-icon-60x60-ed03722d.png",
  "images/apple-touch-icon-72x72.png": "images/apple-touch-icon-72x72-a6a33372.png",
  "images/apple-touch-icon-76x76.png": "images/apple-touch-icon-76x76-8d957963.png",
  "images/apple-touch-icon-precomposed.png": "images/apple-touch-icon-precomposed-8425b83a.png",
  "images/apple-touch-icon.png": "images/apple-touch-icon-a647c234.png",
  "images/favicon-160x160.png": "images/favicon-160x160-8a1a5695.png",
  "images/favicon-16x16.png": "images/favicon-16x16-b2e8ef41.png",
  "images/favicon-192x192.png": "images/favicon-192x192-d09408e5.png",
  "images/favicon-32x32.png": "images/favicon-32x32-44ee3da6.png",
  "images/favicon-96x96.png": "images/favicon-96x96-d35b3d6e.png",
  "images/mstile-144x144.png": "images/mstile-144x144-3da555ec.png",
  "images/mstile-150x150.png": "images/mstile-150x150-5a8bfc6f.png",
  "images/mstile-310x150.png": "images/mstile-310x150-93c9b526.png",
  "images/mstile-310x310.png": "images/mstile-310x310-42ed38f1.png",
  "images/mstile-70x70.png": "images/mstile-70x70-121de451.png",
  "images/frontpage-hero.svg": "images/frontpage-hero-b25d5717.svg",
  "images/frontpage-service-01.svg": "images/frontpage-service-01-02fc2962.svg",
  "images/frontpage-service-02.svg": "images/frontpage-service-02-3a9f4bd1.svg",
  "images/frontpage-service-03.svg": "images/frontpage-service-03-49e4e765.svg",
  "images/logo.svg": "images/logo-bb31c16a.svg",
  "images/favicon.ico": "images/favicon-31234638.ico"
}

In my PHP backend code, I have a function that lets me retrieve the cache-busting name from the standard name. It is basically just a lookup retrieved from the above mappings file.

Here’s what such a function looks like:

/**
 * Get cache-busting hashed filename from rev-manifest.json.
 *
 * @param  string $filename Original name of the file.
 * @return string Current cache-busting hashed name of the file.
 */
function get_asset_path( $filename ) {

	// Cache the decoded manifest so that we only read it in once.
	static $manifest = null;
	if ( null === $manifest ) {
		$manifest_path = get_stylesheet_directory() . '/assets/rev-manifest.json';
		$manifest = file_exists( $manifest_path )
			? json_decode( file_get_contents( $manifest_path ), true )
			: [];
	}

	// If the manifest contains the requested file, return the hashed name.
	if ( array_key_exists( $filename, $manifest ) ) {
		return '/assets/' . $manifest[ $filename ];
	}

	// Assume the file has not been hashed when it was not foun within the
	// manifest.
	return $filename;
}

This allows me to just refer to the normal filename without the hash within my source code. As an example, here’s how to enqueue the frontend scripts:

wp_enqueue_script(
	'corescripts',
	get_stylesheet_directory_uri() . get_asset_path( 'js/core.js' ),
	array( 'jquery' ),
	null,
	true
);

Using This Approach With Browser Injection

I want to note that it is not possible to combine a cache-busting approach that changes the filename with a live-reloading mechanism that uses browser injection (like browsersync ). Due to the nature of how browser injection works, it needs the file names to remain unchanged.

So, you might want to plan on adding an option in your build pipeline to deactivate the cache busting during development time, so that you can have fast live reloads when working on your stylesheets.

Conclusion

The above approach might be a bit more difficult to set up than going with the default WordPress behavior, but it provides obvious benefits. For sites where the right caching mechanisms become critical, a pipeline like the above should be considered a requirement.

What about you, dear( $reader );? Do you use an approach that differs from the ones I mentioned above? I’d love to know about it!

1 Comment

  1. InLogic Web Design on October 2, 2017 at 8:09 am

    Some proxies will cache assets with a query string, but will ignore the query string. In other words you get the benefits of caching but no way to cache bust.

    I think it’s mad to go back to query strings. Changing the filename is a proven technique that works.Report

Leave a Comment