Bust That Cache Through A Content Hash

By Alain Schlesser | July 10, 2016 |

During a recent discussion with fellow developers, we talked about how to get around caching issues with static assets and shared our approaches.

Every web developer knows the issue. You want to have your static assets be cached by the visitor’s browser with a very long expiry date. This accelerates the site on the visitor’s site, reduces the strain and bandwidth usage on your server, and gives you good PageSpeed scores ( = much SEO! ).

However, when you start making changes to your site, you have to force a hard refresh in your browser (or even worse, reset your static caching subsystem) to get your browser to actually download that changed version of the asset.

That’s why you want to include a “cache busting” system, that suggests to the browser that, when you made a change in your static asset, that new file is actually different and should not be retrieved from the cache, but freshly downloaded.

There are different approaches to achieve this, and I’ll briefly touch upon the most common one before describing the method I use on my sites.

Using Query Strings

The most common approach within the WordPress sphere is to add a query string with the version or with the file modification time to the URL of the asset in question. Using the version in a query string is also what WordPress does out of the box to help you get around caches.

Default WordPress behaviour is to append the version as a query string.

The problem with this approach is that depending on what servers the visitor’s connection is passing through, the assets will not be cached at all. The query string appended to the end of the URL makes some proxy servers assume that the content of that file is dynamic in nature, and dependent on the current value of the query string.

An example of why this is a reasonable assumption is the case of your search box. If the visitor enters something into that box and submits this search query, the URL will have a query string with the actual search terms appended to it: https://example.com/?s=my+search+terms . You would not want to have one cached search result page be shared amongst all your visitor’s, right?

This is also the reason why some page speed analysis tools will flag your assets if they contain such a query string.

Both of the information bits suggested as values within the query string above also come with their specific disadvantages as well.

Using the version string means that you’ll have to remember to manually bump the version every time you make a change. This is boring and error-prone work, and will probably lead to instances where you forgot to bump the version.

Using the file modification time seems to avoid this, but comes with a different disadvantage. If you are using a task runner with an automated build pipeline, chances are that your asset will get completely rebuilt each time your task runner launches a new build. This means that all of your file modification times will change with every single change, even if that change did not change any of the assets at all. This is wasteful, as the caches get invalidated much more often than is needed.

Also, using the file modification time means that every page request will make a file access on every static asset to collect these modification times. While these accesses are probably cached by the underlying filesystem, it might nevertheless have a negative performance impact.

A Better Approach

To improve upon the commonly used methods, the requirements are:

No query strings, only a static URL.
Cache is only busted if this is actually needed.

To meet the first requirement, we put whatever information we need to bust the cache inside of the filename itself. Easy, right?

Well, turns out, this is not that easy at all. Having a changing component within the filename makes it actually much more difficult to determine one of the data elements you need to generate the enqueueing URL: …the exact filename!

This is why we will need some kind of mapping information that lets us deduce the current dynamically generated filename from a static filename or identifier.

To meet the second requirement, we make the information we use to bust the cache depend on the actual contents of the file. So, even if the file gets rebuilt through our task runner, it will not cause a cache bust until there was an actual change for that specific file.

The usual way of achieving this is to generate a hash of the file contents. For files with the same content, the generated hash will always be the same.

My Current Implementation

My static assets are run through a gulp pipeline. gulp is a task runner, and it allows me to automate my entire build process. It can also watch my files, and relaunch the build process every time I save a change to one of the watched files.

JavaScript files, for example, are all split up into individual modules. They are run through browserify, where all the modules are concatenated, all dependencies pulled in, and everything is optimized and minimized. This results in one single JavaScript file that needs to be enqueued at the frontend.

At the end of the gulp tasks for each of the static asset types, they are run through gulp-rev (https://github.com/sindresorhus/gulp-rev). gulp-rev reads the content and builds a 10-char hash out of it, and appends this to the filename: unicorn.css → unicorn-d41d8cd98f.css.

It will also create a rev-manifest.json file, that contains the mappings from the original name to the cache-busting name.

I reconsolidate the rev-manifest.json files from each of the asset type pipelines and save this file into the root of my assets folder.

Here’s a real example of what such a rev-manifest.json file will look like (in use on one of my sites):

{
  "js/core.js": "js/core-7ab954de.js",
  "styles/atomic-editor.css": "styles/atomic-editor-ff7ef446.min.css",
  "styles/atomic.css": "styles/atomic-8f3b28e1.min.css",
  "js/frontend.js": "js/frontend-ad0cbc15.js",
  "js/backend.js": "js/backend-9b0842cd.js",
  "images/apple-touch-icon-114x114.png": "images/apple-touch-icon-114x114-13efabdd.png",
  "images/apple-touch-icon-120x120.png": "images/apple-touch-icon-120x120-bc7f8566.png",
  "images/apple-touch-icon-144x144.png": "images/apple-touch-icon-144x144-97485e89.png",
  "images/apple-touch-icon-152x152.png": "images/apple-touch-icon-152x152-98ce2b7e.png",
  "images/apple-touch-icon-180x180.png": "images/apple-touch-icon-180x180-0bab6263.png",
  "images/apple-touch-icon-57x57.png": "images/apple-touch-icon-57x57-52911982.png",
  "images/apple-touch-icon-60x60.png": "images/apple-touch-icon-60x60-ed03722d.png",
  "images/apple-touch-icon-72x72.png": "images/apple-touch-icon-72x72-a6a33372.png",
  "images/apple-touch-icon-76x76.png": "images/apple-touch-icon-76x76-8d957963.png",
  "images/apple-touch-icon-precomposed.png": "images/apple-touch-icon-precomposed-8425b83a.png",
  "images/apple-touch-icon.png": "images/apple-touch-icon-a647c234.png",
  "images/favicon-160x160.png": "images/favicon-160x160-8a1a5695.png",
  "images/favicon-16x16.png": "images/favicon-16x16-b2e8ef41.png",
  "images/favicon-192x192.png": "images/favicon-192x192-d09408e5.png",
  "images/favicon-32x32.png": "images/favicon-32x32-44ee3da6.png",
  "images/favicon-96x96.png": "images/favicon-96x96-d35b3d6e.png",
  "images/mstile-144x144.png": "images/mstile-144x144-3da555ec.png",
  "images/mstile-150x150.png": "images/mstile-150x150-5a8bfc6f.png",
  "images/mstile-310x150.png": "images/mstile-310x150-93c9b526.png",
  "images/mstile-310x310.png": "images/mstile-310x310-42ed38f1.png",
  "images/mstile-70x70.png": "images/mstile-70x70-121de451.png",
  "images/frontpage-hero.svg": "images/frontpage-hero-b25d5717.svg",
  "images/frontpage-service-01.svg": "images/frontpage-service-01-02fc2962.svg",
  "images/frontpage-service-02.svg": "images/frontpage-service-02-3a9f4bd1.svg",
  "images/frontpage-service-03.svg": "images/frontpage-service-03-49e4e765.svg",
  "images/logo.svg": "images/logo-bb31c16a.svg",
  "images/favicon.ico": "images/favicon-31234638.ico"
}

In my PHP backend code, I have a function that lets me retrieve the cache-busting name from the standard name. It is basically just a lookup retrieved from the above mappings file.

Here’s what such a function looks like:

/**
 * Get cache-busting hashed filename from rev-manifest.json.
 *
 * @param  string $filename Original name of the file.
 * @return string Current cache-busting hashed name of the file.
 */
function get_asset_path( $filename ) {

	// Cache the decoded manifest so that we only read it in once.
	static $manifest = null;
	if ( null === $manifest ) {
		$manifest_path = get_stylesheet_directory() . '/assets/rev-manifest.json';
		$manifest = file_exists( $manifest_path )
			? json_decode( file_get_contents( $manifest_path ), true )
			: [];
	}

	// If the manifest contains the requested file, return the hashed name.
	if ( array_key_exists( $filename, $manifest ) ) {
		return '/assets/' . $manifest[ $filename ];
	}

	// Assume the file has not been hashed when it was not foun within the
	// manifest.
	return $filename;
}

This allows me to just refer to the normal filename without the hash within my source code. As an example, here’s how to enqueue the frontend scripts:

wp_enqueue_script(
	'corescripts',
	get_stylesheet_directory_uri() . get_asset_path( 'js/core.js' ),
	array( 'jquery' ),
	null,
	true
);

Using This Approach With Browser Injection

I want to note that it is not possible to combine a cache-busting approach that changes the filename with a live-reloading mechanism that uses browser injection (like browsersync ). Due to the nature of how browser injection works, it needs the file names to remain unchanged.

So, you might want to plan on adding an option in your build pipeline to deactivate the cache busting during development time, so that you can have fast live reloads when working on your stylesheets.

Conclusion

The above approach might be a bit more difficult to set up than going with the default WordPress behavior, but it provides obvious benefits. For sites where the right caching mechanisms become critical, a pipeline like the above should be considered a requirement.

What about you, dear( $reader );? Do you use an approach that differs from the ones I mentioned above? I’d love to know about it!

Posted in Software Development and tagged Asset Management, Cache Invalidation, gulp, PHP

13 Comments

InLogic Web Design on October 2, 2017 at 8:09 am

Some proxies will cache assets with a query string, but will ignore the query string. In other words you get the benefits of caching but no way to cache bust.

I think it’s mad to go back to query strings. Changing the filename is a proven technique that works.Report

Pablo López on November 7, 2018 at 5:32 am

Congrats Alain, very very good article. I have had problems with query strings and file name solutions (filemtime an rewriterules in .htaccess) with CDN

I think in this article it’s explained the real and only solution for assets cache busting. Thanks a lot

Now, I have only one doubt. I will investigate for remove old files generated by gulp-rev and keep clean my assets folderReport

Rasmus Schultz on December 17, 2018 at 1:02 pm

What about source-maps?

Looks like we’ll need to rewrite sourceMappingURL annotations in JS and CSS files at deployment, but it’s starting to feel rather brittle

Also, we’re using require.js to dynamically load modules – this breaks that as well, since you can’t just change the filenames or it won’t know where to find a dependency.

Alternatively, I guess we could enforce running every deployed asset through webpack to make sure that imports and filenames are processes unanimously, but… this would suck during development – having to run a build-script to change a single character in a script somewhere.

I’d almost rather have the client revalidate every asset with every request and just hope that most clients have HTTP 2 enabled so it doesn’t impact performance too terribly, but, ugh… I guess there’s no simple, clean, performant way to do this. These problems are more complex than they may seem on the surface…Report

- Alain Schlesser on April 10, 2019 at 11:29 am
  
  Hi Rasmus,
  
  The article is pretty dated already. By now, most front-end development has switched to Webpack, and Webpack allows for the same basic principle as described in the article, but in a more automated fashion. It should already support adapting the source maps OOTB, if you configure it correctly.Report
  
Paul on January 7, 2019 at 6:13 am

As a ruby developer who’s inherited a monster of a php project this is a great solution to a problem I always considered ‘solved’. Thanks for sharing this solution Alain and introducing me to gulp!

Anyone else having trouble using the rev package and retaining the folder structure?

For instance I have:

assets/img
assets/css
assets/js

that I want to compile to

public/img
public/css
public/js

but my manifest.json is only showing the file names, not the img/ css/ and js/ directories.Report

- Alain Schlesser on April 10, 2019 at 11:36 am
  
  You are probably setting a wrong base path when globbing for your files. The base path is what gulp uses to deduce a proper relative filepath from an absolute filepath.
  
  This StackOverflow entry has a more thorough explanation: https://stackoverflow.com/questions/35845039/how-base-option-affects-gulp-src-gulp-destReport
  
Suleiman AbdulMajeed on February 12, 2019 at 12:41 pm

Hi, this looks like a great tutorial but i don’t know how to use it because i don’t know how to run gulp with php.
How do i run gulp with php? How do i use this with a wordpress site?Report

- Alain Schlesser on April 10, 2019 at 11:38 am
  
  You cannot use gulp in PHP, it is a node.js project.
  
  But gulp is just a general “task runner” you can use to automate all sorts of tasks for your project. If you want to use a task runner in PHP, I recommend Robo, which you can find at https://robo.li/ .Report
  
Miriam Speert Crowley on August 29, 2019 at 5:47 pm

If you’re requesting the human-readable filename from the front-end, won’t that still create the cacheing issue? Wouldn’t it be necessary to use the hashed version from the front-end?Report

- Alain Schlesser on August 29, 2019 at 5:50 pm
  
  We’re passing the human-readable filename through the get_asset_path() function when we request the asset. This function maps the human-readable file we requested to the actual hashed version we need. So the frontend does load the hashed version, but we used a convenience helper function instead of hard-coding these hashed filenames.Report
  
Andrew on January 28, 2020 at 3:00 pm

You note that “Also, using the file modification time means that every page request will make a file access on every static asset to collect these modification times.”

But isn’t your strategy effectively doing the same thing by looking up values in your mapping file, i.e. a file access on every static asset?Report

- Alain Schlesser on January 28, 2020 at 4:46 pm
  
  Hey Andrew,
  
  No, that’s not the case. The file modification time request is always forced to be uncached (otherwise the information would be meaningless). Also, we are only reading the manifest file once per request. So, in the worst-case scenario, we trade multiple uncached filesystem reads per request for one cached filesystem read per request.
  
  However, in most cases, the filesystem will not even be touched for the manifest file, as it is purely static and can be cached in server memory. So, we will not even have that one filesystem read per request. It is read once in a while into memory and that’s it.Report
  
George on September 14, 2020 at 3:03 am

Hey Alain,

Interesting article! Facing a dilemma after reading this and would love to have your input:

Details:
– Also using WP with build tools (so your implementation should be possible)
– Only using 1 CSS file and 1 JS file
– Files are cached by Cloudflare

In this situation:
– Would you suggest just going with query strings + file modification time due to it only being 2 files?
OR
– Would you still advise using your alternative, and if so what is current best practice? As you mentioned yourself the article is quite dated. Would Webpack still be ‘the’ way to go at this moment?Report