Zend Framework and web hosting services

Posted on Tuesday, November 24th, 2009 under , ,

Let’s face it. We can’t all buy or rent our own servers. It’s not cost effective. So we turn to hosting companies – always the popular choice. But when you’re using an shared hosting service, you can’t make changes in the server’s configuration file. You usually get a writeable directory in a location like

/home/{your-username}/public_html/

…and a Cpanel account to manage your share of the server, emails, databases and so on. And…that’s about it. You can’t point your webroot to the public directory as taught in Zend Framework’s manual, all your library files will be exposed to the public and so on. What to do then? The answer is simple: .htaccess and mod_rewrite. Usually, hosting companies install mod_rewrite and set AllowOverride to true, so this solution will work in the vast majority of cases.

Upload via FTP or SSH your content into your public_html, wwwroot or www directory (the webroot directory, whatever its name may be). Then add 4 htaccess files, one directly in the web root, one in the public directory and one in all other folders. It should look something like this (without the numbers, of course).

public_html/
     /public
       .htaccess (2)
     /application
        .htaccess (3)
     /library
        .htaccess (3)
    .htaccess (1)

Now, the first htaccess file – #1 – should contain the following:

RewriteEngine On
RewriteRule ^(.*)$ public/$1 [L]

…this will redirect the traffic from this folder to the public/ sub-folder. The second .htaccess – the one in the public folder – should be the .htaccess file shipped with Zend Framework. It usually looks like this:

SetEnv APPLICATION_ENV production 
 
Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} -s [OR]
RewriteCond %{REQUEST_FILENAME} -l [OR]
RewriteCond %{REQUEST_FILENAME} -d
RewriteRule ^.*$ - [NC,L]
RewriteRule ^.*$ index.php [NC,L]

And the .htaccess files located in all other directories – #3 – should simply forbid access to their respective folders, like such:

deny from all

Worked for me :)

Web svn administration on Debian

Posted on Friday, October 30th, 2009 under , ,

When I work alone on some projects, I usually use git for versioning. I have a github account and I throw there my code, for safe keeping, since GitHub’s datacenter is much more reliable than anything I could ever improvise at home. And it’s quite cheap also.

But at work I’ve always used SVN and I also wanted to use it at my current workplace. So I had to install it. Since we’re using Debian and all the components are in the repositories, things went pretty smooth:

sudo apt-get install subversion libapache2-svn 
sudo a2enmod dav
sudo a2enmod dav-svn

And basically that’s it. It works. But since I’m lazy, I wanted a simple way to create repositories, manage the users’ rights and so on. That “svn admin create” stuff just doesn’t cut it for me. Thus I looked over the web in search of a suitable web interface for SVN. And I’ve found one developed by some French guys, called User Friendly SVN. It’s built on Zend Framework, it’s simple, very simple for that matter – just the essentials – but it gets the job done. It really easy to use and easy to install. If you want a SVN management tool, give it a try.

Now I’m looking for a way to tie – as simple as possible – the system user accounts with SVN’s user account. Use password to rule them all :P

Caching problems with mod_expires

Posted on Monday, September 21st, 2009 under , ,

apacheSometimes, when you’re dealing with large images, flash movies or large javascript files, it’s generally a good idea to force them in the client’s cache.

A very simple way to achieve this is by using Apache’s mod_expires. For instance, if you add the following example, taken from the manual, to your .htaccess file – assuming of course that mod_expires is properly installed and configured – it will tell the browser to cache all the files for a month.

ExpiresDefault "access plus 4 weeks"

So, every time the client returns in the following month to the site, the browser won’t download all the static content again, but load it locally from the cache, thus minimising the loading time. Of course, there are some issues that usually appear after updates. Particularly after updates of the cached files :)

For instance, you add a new Javascript functionality to the site or make some changes in the css or swf files and the user doesn’t hit at CTRL+F5 to fully refresh the page and clear its cache, then he will see the old version of the site. Of course, one can take the short road to LamerVille and post a message on the site, asking the user to refresh the page. But that’s a little too lame to be taken into consideration, especially when dealing with respectable sites.

But there’s another way, much more elegant. First of all, place all the static, cache-able files in a separate folder. The browser will cache all the files based on their URL. If you want the browser to reload all the static data on every new request, you need to change the URLs on every new request.

Let’s say, the site resides at www.example.com and that all the static information will be served from www.example.com/static/. Now, a good idea is to make the links look like this:

http://www.example.com/static/(release-number)/css/style.css
http://www.example.com/static/(release-number)/js/cool-ajax-app.js

Where release-number is a number that increments with every new release. This way, the URLs will be different after each release thus forcing the browsers to fetch the new files. You don’t need to go through lots of files and increment the release number by hand, you can just use the following python script:

#!/usr/bin/python
"""
Read more about what this script actually does on 
http://blog.motane.lu/2009/09/21/caching-problems-with-mod_expires/
 
Usage:
	python increment_release.py start_directory static_prefix
 
Author:
	Tudor Barbu http://blog.motane.lu
"""
import sys, os, re
 
REGEX = ''
CACHED_DIR = ''
 
def main():
    global REGEX, CACHED_DIR 
 
    if sys.argv is not None:
        length = len( sys.argv )
        if length < 3:
            print 'Read the comments in the source code'
            exit()
    start_folder = sys.argv[1]
    CACHED_DIR = sys.argv[2]
 
    REGEX = re.compile( '=(\'|")' + re.escape( CACHED_DIR ) + '\/((\d+)\/|)([^\'|"]+)(\'|")' )
 
    parse_files( start_folder )
 
def parse_files( dir ):
    basedir = dir
    subdirectories = []
    for item in os.listdir( dir ):
        if os.path.isfile( os.path.join( basedir, item ) ):
            perform_replace( os.path.join( basedir, item ) )
        else:
            subdirectories.append( os.path.join( basedir, item ) )
    for subdir in subdirectories:
        parse_files( subdir )
 
def perform_replace( file ):
    global REGEX
    f = open( file, 'r' )
    contents = f.read()
    f.close()
    if REGEX.search( contents ):
        f = open( file, 'w' )
        f.write( REGEX.sub( handle_match, contents ) ) 
        f.close()
 
def handle_match( matches ):
    global CACHED_DIR
    if matches.group(3) is not None:
        revision_number = int( matches.group(3) ) + 1
    else:
        revision_number = 1
    return '=%s%s/%s/%s%s' % ( matches.group(1), CACHED_DIR, revision_number, matches.group(4), matches.group(5) )
 
if __name__ == '__main__':
    main()

…and, of course, there’s no need to create lots of directories either. A simple .htaccess rewrite rule will do. Just redirect all the URLs like /static/(number)/css/style.css to point to /static/css/style.css, by adding these 2 lines in the /static/.htaccess file:

RewriteEngine On
RewriteRule ^\/static\/(\d+)\/(.*)$ $2 [NC,L]

This should solve all your caching related problems. If you want to look savvy, you can use the version number of the head revision from subversion of whatever versioning system you might be using instead of a simple incremental number.

Yes, I know that the script is a little bit buggy but it works for me. If you have an improved version, post a comment below. Credits will be given.

ALERT – configured POST variable limit exceeded

Posted on Tuesday, September 15th, 2009 under ,

ALERT – configured POST variable limit exceeded…this error kept poping in my server’s logs all morning, and – what a strange coincidence – an application stopped working in that exact time frame :)

Well, upon investigation, this exception is thrown by suhosin when a client application sends too many variables to the server. In my case, via HTTP POST.

The solution is simple, just increase the maximum number of allowed variables in php.ini. If you don’t have a suhosin section, just create one. Like such:

[suhosin]
suhosin.request.max_vars = 1000
suhosin.post.max_vars = 1000

Apache catch-all subdomains configuration

Posted on Monday, September 14th, 2009 under ,

Since my job description is “Jack of all trades” today I had to configure some virtual hosts on our Apache web server. One of them had to have a catch-all rule for subdomains.

I’m not a particularly good system administrator and I don’t want to become one. Programming is my thing. But, in order to put bread on the table, we sometimes have to do things we don’t like that much. For me, today was one of those days. And, after spending about two hours browsing Apache’s gargantuan documentation, I’ve finally figure it out and decided to share it with the world, in hope that will save somebody else’s time.

With a disclaimer that this might not be the best solution out there and I’m sure that are better, more elegant way to achieve this. But it works!

First of all, the main vhost looks something like this:

<VirtualHost *>
    ServerName www.my-site.com
    ServerAdmin webmaster@my-site.com
 
    DocumentRoot /var/www/my-site
    <Directory /var/www/hosted/my-site>
         Options FollowSymLinks
         AllowOverride All
         Order allow,deny
         Allow from all
    </Directory>
   #bla bla bla
</VirtualHost>

Apache queues all the virtual hosts and on each request it iterates through this queue in searching for a suitable virtual host definition. The first one that matches the request’s parameters is used, so the above definition must be first in the /etc/apache2/sites-available/my-site.com. This definition will serve the “www” sub-domain and all other sub-domains will be redirected here.

After the “main” virtual host, just add a definition for another virtual host, like such:

<VirtualHost *>
    ServerName my-site.com
    ServerAlias *.my-site.com
    RedirectMatch permanent (.+)$ http://www.my-site.ro$1
</VirtualHost>

This way, all URLs that look like

http://any-subdomain.my-site.com/whatever-file.php

or

http://my-site.com/whatever-file.php

will be redirected to

http://www.my-site.com/whatever-file.php

Aside from correcting some spelling mistakes, like ww.my-site.com this will boost link popularity, because the crawlers usually consider www.my-site.com and my-site.com as two different sites (and, in fact, they are, but anyway), so incoming links to one will not be taken into consideration when computing the link popularity index of the other one. But, with a 301 permanent redirection, the crawler will consider both URLs to be the same site.

Another good idea is to also buy the “non-dash” spelling form of the domain – aka mysite.com – and redirect it to the original address. Like such:

<VirtualHost *>
    ServerName mysite.com
    ServerAlias *.mysite.com
    RedirectMatch permanent (.+)$ http://www.my-site.ro$1
</VirtualHost>

Got the job done for me. And remember: Linux is like a wigwam: no windows, no doors, Apache inside :P