Sometimes, when you’re dealing with large images, flash movies or large javascript files, it’s generally a good idea to force them in the client’s cache.
A very simple way to achieve this is by using Apache’s mod_expires. For instance, if you add the following example, taken from the manual, to your .htaccess file – assuming of course that mod_expires is properly installed and configured – it will tell the browser to cache all the files for a month.
ExpiresDefault "access plus 4 weeks"
So, every time the client returns in the following month to the site, the browser won’t download all the static content again, but load it locally from the cache, thus minimising the loading time. Of course, there are some issues that usually appear after updates. Particularly after updates of the cached files
For instance, you add a new Javascript functionality to the site or make some changes in the css or swf files and the user doesn’t hit at CTRL+F5 to fully refresh the page and clear its cache, then he will see the old version of the site. Of course, one can take the short road to LamerVille and post a message on the site, asking the user to refresh the page. But that’s a little too lame to be taken into consideration, especially when dealing with respectable sites.
But there’s another way, much more elegant. First of all, place all the static, cache-able files in a separate folder. The browser will cache all the files based on their URL. If you want the browser to reload all the static data on every new request, you need to change the URLs on every new request.
Let’s say, the site resides at www.example.com and that all the static information will be served from www.example.com/static/. Now, a good idea is to make the links look like this:
http://www.example.com/static/(release-number)/css/style.css
http://www.example.com/static/(release-number)/js/cool-ajax-app.js
Where release-number is a number that increments with every new release. This way, the URLs will be different after each release thus forcing the browsers to fetch the new files. You don’t need to go through lots of files and increment the release number by hand, you can just use the following python script:
#!/usr/bin/python
"""
Read more about what this script actually does on
http://blog.motane.lu/2009/09/21/caching-problems-with-mod_expires/
Usage:
python increment_release.py start_directory static_prefix
Author:
Tudor Barbu http://blog.motane.lu
"""
import sys, os, re
REGEX = ''
CACHED_DIR = ''
def main():
global REGEX, CACHED_DIR
if sys.argv is not None:
length = len( sys.argv )
if length < 3:
print 'Read the comments in the source code'
exit()
start_folder = sys.argv[1]
CACHED_DIR = sys.argv[2]
REGEX = re.compile( '=(\'|")' + re.escape( CACHED_DIR ) + '\/((\d+)\/|)([^\'|"]+)(\'|")' )
parse_files( start_folder )
def parse_files( dir ):
basedir = dir
subdirectories = []
for item in os.listdir( dir ):
if os.path.isfile( os.path.join( basedir, item ) ):
perform_replace( os.path.join( basedir, item ) )
else:
subdirectories.append( os.path.join( basedir, item ) )
for subdir in subdirectories:
parse_files( subdir )
def perform_replace( file ):
global REGEX
f = open( file, 'r' )
contents = f.read()
f.close()
if REGEX.search( contents ):
f = open( file, 'w' )
f.write( REGEX.sub( handle_match, contents ) )
f.close()
def handle_match( matches ):
global CACHED_DIR
if matches.group(3) is not None:
revision_number = int( matches.group(3) ) + 1
else:
revision_number = 1
return '=%s%s/%s/%s%s' % ( matches.group(1), CACHED_DIR, revision_number, matches.group(4), matches.group(5) )
if __name__ == '__main__':
main()
…and, of course, there’s no need to create lots of directories either. A simple .htaccess rewrite rule will do. Just redirect all the URLs like /static/(number)/css/style.css to point to /static/css/style.css, by adding these 2 lines in the /static/.htaccess file:
RewriteEngine On RewriteRule ^\/static\/(\d+)\/(.*)$ $2 [NC,L]
This should solve all your caching related problems. If you want to look savvy, you can use the version number of the head revision from subversion of whatever versioning system you might be using instead of a simple incremental number.
Yes, I know that the script is a little bit buggy but it works for me. If you have an improved version, post a comment below. Credits will be given.
An alternative would be to add ?version=123 to the end of files, instead of faking a directory.
I have some files like scriptaculous.js?load=effects or file.swf?load=some_param and it was simpler for me to fake some directories. If you don’t use such constructs, perhaps the ?version=123 parameter will be easier to use.
I like this directory based approach, instead of using a query param), but I think there are a few problems with both of them (assuming that I got right what you said).
At work I’m using Capistrano for deployment, and Capistrano creates a file called REVISION in which it writes the current revision hash (taken from Git).
If I’d use this revision number as a directory name under which all the static components will virtually reside, that means on every deployment, all static components will be have to be re-fetched by the browser (and we do a lot of deployments). What I’ve done instead is to write a little PHP script that concatenates and minifies all CSS and JS files at deploy time and writes the string in a new file and names it like all.sha1_hash.js. The sha1_hash is the sha1 checksum of the new concatenated, minified source. Thus, the name of the file only gets modified when the sum of the contents modifies.
For images, that are not used for styling, I use a subdomain (static) which indeed uses mod_expires, and their expiry date is about 4 months, although I could add more. Actually, CSS and JS files are sent with Expires headers too.
Yep, using checksums is another good idea, but it might take a little too long if you have lots of large files that need hashing. Anyway, it’s more a matter of personal convenience, I think that all approaches are good.
“Just redirect all the URLs like /static/(number)/css/style.css to point to /static/css/style.css, by adding these 2 lines in the /static/.htaccess file:
RewriteEngine On
RewriteRule ^\/static\/(\d+)\/(.*)$ $2 [NC,L]
”
Actually the rewrite as you wrote it will be from /static/(number)/css/style.css to /css/style.css not /static/css/style.css (also you don’t need to escape forward slashes, this perhaps comes from the fact that you used to use / as a delimiter, but in rewriterules it’s not the case). You should do:
RewriteRule ^/static/\d+/(.*)$ /static/$1 [NC,L]
or, if you use Apache 2.x, you can use the lookbehind
RewriteRule (?<=^/static/)\d+/(.*)$ $1 [NC,L]
[...] a file. There's no way to flush the cache, and you can't make all you users do a hard refresh. Caching problems with mod_expires :: Flush the browser's cache with mod_expires | Tudor Barbu's prof… gives an example of using a release folder, but that's not exactly easy to do with vb either. And [...]