Cache Busting Front-end Resources: Is File Name Revving Still Necessary?

on | 10 Comments

This just in: Web developers are easily swayed by tentacled sea creatures. But wait, I’m getting ahead of myself. I’ll get back to that in a moment.

Caching and cache-busting front-end resources have been common for a number of years now. When dealing with front-end resources, you want to be able to accomplish two things:

  1. Cache front-end resources for better performance
  2. Refresh those resources (i.e. “cache bust”) as soon as they’ve been updated, so users aren’t fed out-of-date files

The problem is that those two things are in opposition to each other. You want resources to be cached more or less indefinitely. But if you update one of them (e.g. make a few changes to a stylesheet), you don’t want the user to get the old cached version; you want the updated version to take effect on the first visit.

Solving the Cache Paradox

To help solve the first problem, most developers use far-future expires headers in their .htaccess file. This ensures the resources are cached for a specified time period (usually a year or more). And this will remain so as long as the user doesn’t erase their browser cache.

Here’s part of what you might include in your .htaccess file to do this, as found in HTML5 Boilerplate:

<IfModule mod_expires.c>
  ExpiresActive on
  ExpiresDefault                  "access plus 1 month"

# CSS
  ExpiresByType text/css          "access plus 1 year"

# rest of stuff here...
</IfModule>

To solve the second problem, the first thing I can remember developers trying was adding a query string value to a resource’s URL. You can see this implemented in a really early version of HTML5 Boilerplate.

 <link rel="stylesheet" href="css/style.css?v=2">

In fact, shortly after seeing this in H5BP, I wrote a quick tip describing how to use this technique to force a browser to update your stylesheet. To this day, I still get decent search traffic on that page from people looking for ways to do this.

Generally speaking, this works. It basically tricks the browser into thinking that the resource being requested is different, even though it has the same file name. So instead of showing the user the cached version, it makes a new request to download the content again, thus retrieving a freshly updated resource.

Along Came A Souders

One of the comments in that quick tip of mine mentioned that it’s not recommended to use a query string value to update a resource. The user pointed to a Steve Souders article from 2008 that discusses why it’s best to change the file name itself rather than append a query string.

The gist of the Souders piece is that certain web proxies will not cache resources that use a query string in the URL. This means that if users are viewing your content through such a proxy, you’ll get the benefit of immediately-updated resources but you won’t get the benefit of caching.

It’s notable that the only proxy that Souders mentions is Squid. It would have been nice if others had been discussed, but I suppose if one major proxy has this problem, then that would be enough to warrant changing from query string revving to some other method.

A Very Brief History of Cache Busting

As a result, based on the default behavior of a cephalopodically-named piece of proxy software, developers have for years forced browsers to update resources using “revved” (or revisioned) file names instead of query strings (who knew squids could be so persuasive?):

<link rel="stylesheet" href="css/styles.74638454784.min.css">

You can do this manually if you’re a masochist, or you can achieve this through a build workflow, so it’s done automatically when you build and deploy. Chris Coyier covered a lot of this ground in his 2015 post on cache busting CSS.

But a comment by Joseph Scott on that article pointed out that it’s probably not necessary to use file name revving over query strings just because of Squid proxy:

You mentioned that this was something that Steve Souders discovered. It had to do with the default configuration shipping with Squid at the time. That default was changed in Squid 2.7, which [was] released 7 years ago.

After some digging around the Squid release notes, I found the page that describes the change in Squid 2.7:

The default rules to not cache dynamic content from cgi-bin and query URLs have been altered. Previously, the “cache” ACL was used to mark requests as non-cachable – this is enforced even on dynamic content which returns cachability information. This has changed in Squid-2.7 to use the default refresh pattern. Dynamic content is now cached if it is marked as cachable.

The bug was addressed and fixed sometime between 2008 and 2010, and the final version of Squid 2.7 was released in March of 2010, before HTML5 Boilerplate was released. The old configuration is discussed on this page of the Squid wiki.

The commenter Joseph Scott even asked Steve Souders about this, and Souders said it would be a difficult thing to test definitively. I’m guessing the problem with testing is that you’d have to test all similar proxies, which would be pretty tedious work.

For some further validation on this, I asked about it on Twitter while I was researching this article, and got a similar response from Ilya Grigorik, a Google employee:

Going Back to Query String Revisioning

I’ve done the file name revving thing before, but pending further evidence on the proxy thing and pending further evidence on other potential benefits, I’ve decided to go back to cache busting resources using query string values.

I recently revamped the content of one of my side projects, the CSS3 Click Chart. In addition to updating the different CSS features in the content, I decided I would also update the JavaScript and the build process I was using to spit out the final code. I implemented a Gulp workflow to minify, concatenate, create critical CSS, etc.

When looking for a Gulp-based cache-busting solution, I originally tried to find a decent option that uses file name revving, but I couldn’t find a solution I was comfortable with for such a simple project. After doing some research on whether or not file name revving is necessary (see above), I decided to go with gulp-cache-bust, which revs using the query string. So I have something like this in my gulpfile.js:

gulp.task('cache-bust', function() {
  gulp.src('dist/index.php')
    .pipe(cachebust({
    type: 'timestamp'
  }))
  .pipe(gulp.dest('dist/'))
});

And my resources, after cache busting, look like this:

<script src="js/general.min.js?t=1505782709943"></script>

Your process might be different, but as you can see in the gulpfile, I’m cache busting directly from the dist folder after I’ve done other optimizations (concat, minify, etc) from the src folder.

Conclusion

Is it right to go back to cache busting via the query string, ignoring the 2008 advice from Souders? I can’t say for sure if this is the right choice for all projects. The Google Developers documentation on performance optimization (written by Ilya Grigorik) uses file name revving as an example of how to cache bust front-end resources:

How do you get the best of both worlds: client-side caching and quick updates? You change the URL of the resource and force the user to download the new response whenever its content changes. Typically, you do this by embedding a fingerprint of the file, or a version number, in its filename—for example, style.x234dff.css.

I’m guessing that advice is a holdover from the Souders-era optimizations. I think it would be helpful to see some definite benefits to doing this besides the remote possibility that a user is visiting your page through an eight-year-old proxy. And the Twitter conversation quoted above seems to suggest file name revving is not necessary. That said, if you’re interested in a file name revving technique, you can try Alain Schlesser’s solution, which I found a little too complex for my simple project.

So for now, I’ll stick with query string versioning until someone establishes a better way that’s easy to implement and has definite, testable benefits over the query string method.

10 Responses

  1. Šime Vidas:

    > When looking for a Gulp-based cache-busting solution, I originally tried to find a decent option that uses file name revving, but I couldn’t find a solution I was comfortable with for such a simple project.

    I’ve switched to Webpack which takes care of all of this via plugins, but from what I can see, there are a bunch of Gulp plugins for adding a hash to a file name. The problem is that these tools don’t provide a way to insert these names into your HTML document, which means that you need to find a different plugin for that, which further increases complexity.

    • Yep, that’s exactly the problem. It’s not hard at all to find a plugin to change the name. But I want one that will take care of everything, like you said.

      At some point I’ll have to look into webpack, though. From what I recall when I tried to look into it, it looked a little overly complex for my tastes.

    • Scott:

      The `gulp-rev` plugin can create a manifest file with `rev.manifest()` which is a simple JSON file. You can then input the filename easily using any language, e.g. `json_decode()` in PHP.

      • Ash:

        Hi Scott ,

        I’m a noobie at using a manifest file created by gulp-rev, how would you get php to read this and call it on to the page?

  2. Scott:

    Great article by the way, kudos for doing the research. Can we also encourage Google et al to update their tools? One thing that annoys me about modern web devs is the blind deference to tools without any critical thinking. I’ve seen so many people in forums asking how to solve the “problem” of using query string version numbers.

  3. mike:

    Some proxies will cache assets with a query string, but will ignore the query string. In other words you get the benefits of caching but no way to cache bust.

    I think it’s mad to go back to query strings. Changing the filename is a proven technique that works.

  4. The above approach might be a bit more difficult to set up than going with the default WordPress behavior, but it provides obvious benefits. For sites where the right caching mechanisms become critical, a pipeline like the above should be considered a requirement.

Leave a Reply

Comment Rules: Please use a real name or alias. Keywords are not allowed in the "name" field. If you use keywords, your comment will be deleted, or your name will be replaced with the alias from your email address. No foul language, please. Thank you for cooperating.

Instructions for code snippets: Wrap inline code in <code> tags; wrap blocks of code in <pre> and <code> tags. When you want your HTML to display on the page in a code snippet inside of <code> tags, make sure you use &lt; and &gt; instead of < and >, otherwise your code will be eaten by pink unicorns.