Monitor free space on your Linux server

Posted on Tuesday, September 22nd, 2009 under ,

Warning: After consulting with some of my friends, I’ve concluded that Nagios is a much better solution, so this article is now obsolete. Read it only if you don’t want to install Nagios on your server.

Now that I have some server that I need to administer and look after, I’m starting to notice potential issues that until several months ago were just somebody’s else’s problems and I didn’t cared much for them. One of which is to monitor the server’s free space, since it’s better to know when your server’s storage capacity is running low. Saves me some money on the headache pills. Well, knowing a scripting language like python comes in handy in times like these, because it helped me to put together a simple script that warns me whenever one or more hard-drives are over 85% full.

Here it is:

#!/usr/bin/python
import commands
WARNING_PERCENTAGE = 85
 
def main():
    global WARNING_PERCENTAGE
    warnings = []
    status, output = commands.getstatusoutput( """df -k | awk '{if ($6 !~ /cdrom/ && NR != 1) print $1 " " ($5 + 0) }'""" )
    for line in output.splitlines():
        hdd, used = line.split( ' ' )
        if int( used ) >= WARNING_PERCENTAGE:
            warnings.append( line )
    if len( warnings ) is not 0:
        send_warning( warnings )
 
def send_warning( warnings ):
    # here you should send e-mails, smses and so on
    print warnings
 
if __name__ == '__main__':
    main()

Since it uses df and awk, it’s obvious that it won’t run on Windows. But who need Windows anyway…

PS: do I have to mention that this script should be ran as a cron job?

Caching problems with mod_expires

Posted on Monday, September 21st, 2009 under , ,

apacheSometimes, when you’re dealing with large images, flash movies or large javascript files, it’s generally a good idea to force them in the client’s cache.

A very simple way to achieve this is by using Apache’s mod_expires. For instance, if you add the following example, taken from the manual, to your .htaccess file – assuming of course that mod_expires is properly installed and configured – it will tell the browser to cache all the files for a month.

ExpiresDefault "access plus 4 weeks"

So, every time the client returns in the following month to the site, the browser won’t download all the static content again, but load it locally from the cache, thus minimising the loading time. Of course, there are some issues that usually appear after updates. Particularly after updates of the cached files :)

For instance, you add a new Javascript functionality to the site or make some changes in the css or swf files and the user doesn’t hit at CTRL+F5 to fully refresh the page and clear its cache, then he will see the old version of the site. Of course, one can take the short road to LamerVille and post a message on the site, asking the user to refresh the page. But that’s a little too lame to be taken into consideration, especially when dealing with respectable sites.

But there’s another way, much more elegant. First of all, place all the static, cache-able files in a separate folder. The browser will cache all the files based on their URL. If you want the browser to reload all the static data on every new request, you need to change the URLs on every new request.

Let’s say, the site resides at www.example.com and that all the static information will be served from www.example.com/static/. Now, a good idea is to make the links look like this:

http://www.example.com/static/(release-number)/css/style.css
http://www.example.com/static/(release-number)/js/cool-ajax-app.js

Where release-number is a number that increments with every new release. This way, the URLs will be different after each release thus forcing the browsers to fetch the new files. You don’t need to go through lots of files and increment the release number by hand, you can just use the following python script:

#!/usr/bin/python
"""
Read more about what this script actually does on 
http://blog.motane.lu/2009/09/21/caching-problems-with-mod_expires/
 
Usage:
	python increment_release.py start_directory static_prefix
 
Author:
	Tudor Barbu http://blog.motane.lu
"""
import sys, os, re
 
REGEX = ''
CACHED_DIR = ''
 
def main():
    global REGEX, CACHED_DIR 
 
    if sys.argv is not None:
        length = len( sys.argv )
        if length < 3:
            print 'Read the comments in the source code'
            exit()
    start_folder = sys.argv[1]
    CACHED_DIR = sys.argv[2]
 
    REGEX = re.compile( '=(\'|")' + re.escape( CACHED_DIR ) + '\/((\d+)\/|)([^\'|"]+)(\'|")' )
 
    parse_files( start_folder )
 
def parse_files( dir ):
    basedir = dir
    subdirectories = []
    for item in os.listdir( dir ):
        if os.path.isfile( os.path.join( basedir, item ) ):
            perform_replace( os.path.join( basedir, item ) )
        else:
            subdirectories.append( os.path.join( basedir, item ) )
    for subdir in subdirectories:
        parse_files( subdir )
 
def perform_replace( file ):
    global REGEX
    f = open( file, 'r' )
    contents = f.read()
    f.close()
    if REGEX.search( contents ):
        f = open( file, 'w' )
        f.write( REGEX.sub( handle_match, contents ) ) 
        f.close()
 
def handle_match( matches ):
    global CACHED_DIR
    if matches.group(3) is not None:
        revision_number = int( matches.group(3) ) + 1
    else:
        revision_number = 1
    return '=%s%s/%s/%s%s' % ( matches.group(1), CACHED_DIR, revision_number, matches.group(4), matches.group(5) )
 
if __name__ == '__main__':
    main()

…and, of course, there’s no need to create lots of directories either. A simple .htaccess rewrite rule will do. Just redirect all the URLs like /static/(number)/css/style.css to point to /static/css/style.css, by adding these 2 lines in the /static/.htaccess file:

RewriteEngine On
RewriteRule ^\/static\/(\d+)\/(.*)$ $2 [NC,L]

This should solve all your caching related problems. If you want to look savvy, you can use the version number of the head revision from subversion of whatever versioning system you might be using instead of a simple incremental number.

Yes, I know that the script is a little bit buggy but it works for me. If you have an improved version, post a comment below. Credits will be given.

Apache catch-all subdomains configuration

Posted on Monday, September 14th, 2009 under ,

Since my job description is “Jack of all trades” today I had to configure some virtual hosts on our Apache web server. One of them had to have a catch-all rule for subdomains.

I’m not a particularly good system administrator and I don’t want to become one. Programming is my thing. But, in order to put bread on the table, we sometimes have to do things we don’t like that much. For me, today was one of those days. And, after spending about two hours browsing Apache’s gargantuan documentation, I’ve finally figure it out and decided to share it with the world, in hope that will save somebody else’s time.

With a disclaimer that this might not be the best solution out there and I’m sure that are better, more elegant way to achieve this. But it works!

First of all, the main vhost looks something like this:

<VirtualHost *>
    ServerName www.my-site.com
    ServerAdmin webmaster@my-site.com
 
    DocumentRoot /var/www/my-site
    <Directory /var/www/hosted/my-site>
         Options FollowSymLinks
         AllowOverride All
         Order allow,deny
         Allow from all
    </Directory>
   #bla bla bla
</VirtualHost>

Apache queues all the virtual hosts and on each request it iterates through this queue in searching for a suitable virtual host definition. The first one that matches the request’s parameters is used, so the above definition must be first in the /etc/apache2/sites-available/my-site.com. This definition will serve the “www” sub-domain and all other sub-domains will be redirected here.

After the “main” virtual host, just add a definition for another virtual host, like such:

<VirtualHost *>
    ServerName my-site.com
    ServerAlias *.my-site.com
    RedirectMatch permanent (.+)$ http://www.my-site.ro$1
</VirtualHost>

This way, all URLs that look like

http://any-subdomain.my-site.com/whatever-file.php

or

http://my-site.com/whatever-file.php

will be redirected to

http://www.my-site.com/whatever-file.php

Aside from correcting some spelling mistakes, like ww.my-site.com this will boost link popularity, because the crawlers usually consider www.my-site.com and my-site.com as two different sites (and, in fact, they are, but anyway), so incoming links to one will not be taken into consideration when computing the link popularity index of the other one. But, with a 301 permanent redirection, the crawler will consider both URLs to be the same site.

Another good idea is to also buy the “non-dash” spelling form of the domain – aka mysite.com – and redirect it to the original address. Like such:

<VirtualHost *>
    ServerName mysite.com
    ServerAlias *.mysite.com
    RedirectMatch permanent (.+)$ http://www.my-site.ro$1
</VirtualHost>

Got the job done for me. And remember: Linux is like a wigwam: no windows, no doors, Apache inside :P