Iterate thru dates in Python

Posted on Friday, February 19th, 2010 under ,

Few days ago I was working on some python scripts that needed to iterate back and forth through calendar dates. Working with dates in python is pretty easy, due to its datetime package.

Basically is like this:

#!/usr/bin/env python
 
import datetime
 
start_date = datetime.date( year = 2010, month = 2, day = 1 )
end_date = datetime.date( year = 2010, month = 1, day = 1 )
 
list = []
 
if start_date <= end_date:
    for n in range( ( end_date - start_date ).days + 1 ):
        list.append( start_date + datetime.timedelta( n ) )
else:
    for n in range( ( start_date - end_date ).days + 1 ):
        list.append( start_date - datetime.timedelta( n ) )
 
for d in list:
    print d

This works, but is somewhat lame and not quite reusable. The most “python-ish” way to do it is to create a generator function that will yield the current date and can be used in a for loop. So I did it:

#!/usr/bin/env python
import datetime
 
def daterange( start_date, end_date ):
    if start_date <= end_date:
        for n in range( ( end_date - start_date ).days + 1 ):
            yield start_date + datetime.timedelta( n )
    else:
        for n in range( ( start_date - end_date ).days + 1 ):
            yield start_date - datetime.timedelta( n )
 
start = datetime.date( year = 2010, month = 2, day = 1 )
end = datetime.date( year = 2010, month = 1, day = 1 )
 
for date in daterange( start, end ):
    print date

Much more elegant :)

Django “anonymous_required” decorator

Posted on Wednesday, January 6th, 2010 under , ,

I like Django’s login_required decorator. It’s a clean and simple way to allow and/or deny un-logged-in users to access parts of the website. But I also felt the need for a decorator to allow me to restrict access to some views only to non logged-in users. For instance, if an user in logged in, it should be denied access to views like /accounts/register or /accounts/login and redirected to his/her profile.

I’ve looked for one on the web, but couldn’t find anything suitable to my needs, so I’ve wrote my own:

from django.http import HttpResponseRedirect
 
def anonymous_required( view_function, redirect_to = None ):
    return AnonymousRequired( view_function, redirect_to )
 
class AnonymousRequired( object ):
    def __init__( self, view_function, redirect_to ):
        if redirect_to is None:
            from django.conf import settings
            redirect_to = settings.LOGIN_REDIRECT_URL
        self.view_function = view_function
        self.redirect_to = redirect_to
 
    def __call__( self, request, *args, **kwargs ):
        if request.user is not None and request.user.is_authenticated():
            return HttpResponseRedirect( self.redirect_to ) 
        return self.view_function( request, *args, **kwargs )

It’s also available on Django Snippets. Its usage is quite simple:

@anonymous_required
def my_view( request ):
    return render_to_response( 'my-view.html' )

That’s about it!

Django configuration file

Posted on Monday, December 28th, 2009 under , ,

django web frameworkI’ve been using Django for quite some time now and I kind of like it. It provides a fast – really fast – and clean way of doing things. When I first started with Django, I’ve used it mainly for simpler projects and kept the larger, more complicated ones on Zend Framework. That’s because I feel much more comfortable with PHP and Zend Framework rather than python and Django.

But, lately, I’ve used django for more ambitious projects and some obvious flaws began to annoy me. The first thing – the one this entry’s about – is the configuration file. Django comes with a settings.py file in which all the settings are being held, without any regard to their purpose.

Environment settings like database connection strings, file system paths and debuging are mixed together with application settings like what middleware classes are used, context processors and so on. So I’ve decided to split the configuration file in two, like in the following example:

local.py

This file holds the host related configuration and each instance of the application should have its own local.py file. One for development, one for testing, one for staging, one for production and so on. This file should not be included in the source control repository, so add a svn/git/whatever ignore on it.

# debug settings
DEBUG = True
TEMPLATE_DEBUG = DEBUG
 
# database configuration
DATABASE_ENGINE = ''
DATABASE_NAME = ''
DATABASE_USER = ''
DATABASE_PASSWORD = ''
DATABASE_HOST = ''
DATABASE_PORT = ''
 
# media root ex: /var/www/my-project/media
MEDIA_ROOT = ''
 
# media url ex: media.my-project.tld
MEDIA_URL = ''
 
# admin media url ex: /media
ADMIN_MEDIA_PREFIX = '/media/'
 
# application's secret key
SECRET_KEY = 'mY-S3Cr3t-K3y-m|_|st-b3-un1QuE-4nD-h4rD-t0-GueSs'

settings.py

This file holds the application related configurations and should be included in the version control system.

import os
 
# root directory for the project
ROOT_DIR = os.path.realpath( os.path.dirname( __file__ ) )
 
ADMINS = (
	( 'root', 'root@project.tld' ),
)
 
# import all the local settings
from local import *
 
MANAGERS = ADMINS
TIME_ZONE = 'Europe/Bucharest'
 
LANGUAGE_CODE = 'en-us'
 
SITE_ID = 1
 
USE_I18N = True
 
TEMPLATE_LOADERS = (
    'django.template.loaders.filesystem.load_template_source',
    'django.template.loaders.app_directories.load_template_source',
)
 
MIDDLEWARE_CLASSES = (
    'django.middleware.cache.UpdateCacheMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.middleware.cache.FetchFromCacheMiddleware',
)
 
ROOT_URLCONF = 'my-project.urls'
 
TEMPLATE_CONTEXT_PROCESSORS = (
    'django.core.context_processors.auth',
    'django.core.context_processors.debug',
    'django.core.context_processors.i18n',
    'django.core.context_processors.media',
)
 
INSTALLED_APPS = (
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.sites',
)
 
# template directories
TEMPLATE_DIRS = (
    os.path.join( ROOT_DIR, 'templates' ),
    os.path.join( ROOT_DIR, 'other', 'template', 'path' ),
)
 
# django debug toolbar configuration
INTERNAL_IPS = ( '127.0.0.1' )
if DEBUG:
    MIDDLEWARE_CLASSES += 'debug_toolbar.middleware.DebugToolbarMiddleware', 
    INSTALLED_APPS += 'debug_toolbar',

The last bit is because I usually use in development the incredibly useful Django debug_toolbar, and of course, I want it turned off on the live environment.

Advertising blog entries on Pidgin’s status

Posted on Friday, August 21st, 2009 under , , ,

I wanted to write this post ever since I’ve read Radu’s Fortune and Pidgin’s status on Ubuntu post. Radu’s approach is quite lame and hard to use, because it relies on the user exporting a SQL dump file every time he posts something on the blog.

And really, do you need *all* the posts in the database? Do you really want to advertise blog entries from 2 years ago? Come on!

My idea is much better…obviously :P …and it goes like this: parse the blog’s RSS and take “inspiration” from there for the statuses. The weapon of choice for this task is python with its feedparser library.

First, some prerequisites:

sudo apt-get install python-feedparser

…and then, the python script:

#!/usr/bin/python
import feedparser
import random
import os
 
# feed url
FEED = 'http://feeds2.feedburner.com/motanelu'
 
feed = feedparser.parse( FEED )
index = random.randrange(0, len( feed['items'] ) - 1 )
status = 'purple-remote "setstatus?status=available&message=%s %s"' % ( feed['items'][index].title, feed['items'][index].link )
os.system( status )
 
# EOF

I consider this approach better than Radu’s, because it doesn’t require exporting the database or messing around with fortune. Read this post to see how to update Pidgin’s status using cron. And enjoy :P

Downloading a page’s content with python and WebKit

Posted on Tuesday, July 7th, 2009 under , , ,

I’ve been bragging with this post for quite some time now. Well, I won’t do that any more, because it seems that pywebkitgtk isn’t the best way to to things out there and that my first solution to the problem sucks :( Yes, the sad truth…

Yesterday, I tried to put the application on the server – a Debian Lenny machine without X. And this is where it all broke down. I don’t want to install Xorg on this machine just so that a small script will work, so I’ve looked for alternatives ways to run the script. One of the first alternatives I’ve found was Xvfb. which, according to Wikipedia

In the X Window System, Xvfb or X virtual framebuffer is an X12 server that performs all graphical operations in memory, not showing any screen output. From the point of view of the client, it acts exactly like any other server, serving requests and sending events and errors as appropriate. However, no output is shown. This virtual server does not require the computer it is running on to even have a screen or any input device. Only a network layer is necessary.

…should get the job done. But it didn’t. While running under Xvfb, GTK kept throwing segmentation faults and crashing the whole script.

I was faced with the following decision: spend hours or perhaps days trying to see why Xvfb and GTK make such uneasy bed fellows or rewrite a 50 lines crawler script. I knew from my previous research on the matter that python also had binding with WebKit and Qt, so I’ve gave it a try. And it proved to be a much better solution than GTK.

QT to the rescue

Although I’m a Gnome/GTK fan, I must admit that Qt is a much better candidate for this job. First of all, it has extensive documentation, whereas pywebkitgtk’s is scarce. And, the second being that it works in my particular case, which can prove to be a huge advantage ;)

Under Ubuntu and Debian, you can install the package by simply typing…

sudo apt-get install python-qt4 libqt4-webkit

…in the console. And you’re done. You can run applications with python and Qt. The rewritten crawler code is:

#!/usr/bin/env python
 
import sys
import signal
 
from optparse import OptionParser
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import QWebPage
 
 
class Crawler( QWebPage ):
	def __init__(self, url, file):
		QWebPage.__init__( self )
		self._url = url
		self._file = file
 
	def crawl( self ):
		signal.signal( signal.SIGINT, signal.SIG_DFL )
		self.connect( self, SIGNAL( 'loadFinished(bool)' ), self._finished_loading )
		self.mainFrame().load( QUrl( self._url ) )
 
	def _finished_loading( self, result ):
		file = open( self._file, 'w' )
		file.write( self.mainFrame().toHtml() )
		file.close()
		sys.exit( 0 )
 
def main():
	app = QApplication( sys.argv )
	options = get_cmd_options()
	crawler = Crawler( options.url, options.file )
	crawler.crawl()
	sys.exit( app.exec_() )
 
def get_cmd_options():
	"""
		gets and validates the input from the command line
	"""
	usage = "usage: %prog [options] args"
	parser = OptionParser(usage)
	parser.add_option('-u', '--url', dest = 'url', help = 'URL to fetch data from')
	parser.add_option('-f', '--file', dest = 'file', help = 'Local file path to save data to')
 
	(options,args) = parser.parse_args()
 
	if not options.url:
		print 'You must specify an URL.',sys.argv[0],'--help for more details' 
		exit(1)
	if not options.file:
		print 'You must specify a destination file.',sys.argv[0],'--help for more details'
		exit(1)
 
	return options
 
if __name__ == '__main__':
	main()

This time it really works. I feel warm and fuzzy on the inside ;)

Django…for the very first time

Posted on Thursday, July 2nd, 2009 under ,

django web frameworkThis is a post I’ve wanted to write for quite some time now, but there is so much say that I couldn’t get the time to write it all down. So I’ve decided to split the first impression on django topic into smaller articles, this being the first post from a longer django series.

I did some projects with python lately, and I like the language a lot, but none of them were involved building web pages. Some vim scripts, some automatization scripts or web crawlers. But not a single web script. Usually a company doesn’t switch easily from PHP to python, or to any other programming language, because of the costs involved (training, etc.), so it’s pretty hard to start a project on another framework. But here, things are a lot simpler. My job description consists of only 4 words: get the sh*t done. Aside from this, I have complete ownership of the projects I’m working on and I’m free to choose whatever technology I please.

Prerequisites

Installing django is a piece of cake. Just follow the instructions on their site and it works like a charm. If you’re using python 2.6, you will receive some warnings concerning the MySQLdb package which uses a deprecated package. Just ignore them, at least until you decide to upgrade to python 3.0 ;)

Preparing the editor. I use vim for editing and I needed something to help me out with code completion, mainly because I’m lazy and don’t like to type long name, and second because I don’t know all the django’s components’ names by hard. I’ve tried pysmell, but it didn’t worked well, and, since it’s marked as experimental, I don’t think it is supposed to. So I’ve tried good ol’ ctags. This is my recommendation. To create a ctags index, simply type:

ctags -R -f ~/.vim/tags/python.ctags /usr/local/lib/python2.6/dist-packages

and in Vim

set tags+=$HOME/.vim/tags/python.ctags

And voila. It works. If you’re using python 2.5 use the appropriate path.

What I like about django

The admin. It rules, it’s simple to customise, it saves a lot of time. I love it. Back in the days, when I was using CakePHP, I’ve often wished that its scaffolding system did what django’s admin does now. I also love the authentication system. It is another well built component that saves a lot of time of routine work. The models – not having to write SQL statements by hand all the time gives me a hard-on. They also provide a logical separation between a single model object (which encapsulates a row in the table) and the statically declared information retrieval methods.

What I don’t like about django

The implementation of the MVC pattern is a little bit strange. They call it MVT (Model View Template), where “the view” is basically a controller’s action in the “traditional” MVC approach, and “the template” is what you’d expect to be the view.

It offers decent degree of logic separation, I don’t like it though that the action and method attributes are being set in the view…template. No one can tell me that where a form sends its data is presentation logic that should reside in a template.

The code generation scripts don’t always work in the way that you’d expect, for instance, if you change something in a model and run the

python manage.py sql myApp
python manage.py syncdb

sequence and the table already exists, it won’t update it. And I hate the template system. Really do. I don’t get its purpose – I don’t get any templating system’s purpuse – and some things in it are really weird. I mean ifequal…hello? Not exactly syntactic sugar :P

Conclusion

I like django, as it provides a quick and clean way of doing things. I will definitely use it in the future, possibly in my next project, so stay tuned for more django articles :P

Pywebkitgtk – execute Javascript from python

Posted on Thursday, June 18th, 2009 under , , , , ,

Python Last week I’ve got a new assignment at my job: a crawler that was supposed to periodically visit some sites and download their content. Sounds simple, isn’t it? Well, it’s not. Mainly because we want to also get all the flash content and some of it is inserted with Javascript, via various libraries like SWFobject or directly with document.write in some cases. I needed a snapshot of how the page actually looks like when the user is looking at it in a browser.

This meant that I had to get the content *after* all the javascripts contained in page finished execution. In developer language, this means after the window.onload event takes place. And, of course, I also needed a Javascript interpreter. So any attempt to use wget/cURL/file_get_contents was destined to fail from the start. I needed browser power :) So I’ve googled around for some.

The first thing I came across was using COM to connect to an Internet Explorer instance from python, use it to navigate back and forth and get the HTML content as it’s interpreted by IE’s engine. This had 3 major drawbacks:

  • it requires Internet Explorer
  • it requires Microsoft Windows
  • it requires an opened IE window

Since we want to migrate everything from our windows servers to linux, it would be pointless to go with this approach, since I’d have to rewrite in a month or so. Let aside the “lameness” of the technologies involved :) And I’m looking for a solution that doesn’t require an opened browser window, mainly because it should work on servers without X because I don’t want to :P (GTK doesn’t work without X – credits go to Alex Novac – and yes, it was retarded of me to think otherwise).

This solution wasn’t good enough, so I kept looking and came across the HtmlUnit Java library. This library is used to write tests in Java for web based applications. Pretty cool. And not so much. Although Java was once my one true love, after all these years spent with scripting languages, declaring variables, compiling the code, writing only OOP code and so on seemed a little…unfamiliar. But it takes more than anApiWithReallyLongCamelCasedClassNames to stop me, so I’ve installed Eclipse and made some tests. Disappointing! The library isn’t very tolerant with messy HTML and Javascript, and, since nobody out there, in the real world, actually abides to W3C recommendations, this library it’s somewhat useless in my case.

The next thing I’ve tried was a solution based on python that relied on integration with Gecko via hulahop. I must admit that I couldn’t get it to work under Ubuntu Jaunty Jackalope, due to incompatibilities in the system’s libraries. I’m sure that with enough time and patience, it can be pursued to work. But, as I didn’t had any, I’ve moved on and tried pywebkitgtk. This proved to be quite okay (I’m not a Safari fan) and it worked out of the box.

After spending several days searching the web, reading articles and trying out different softwares, I decided to share my findings with the world and write a tutorial on how to get the content of a page in python *after* its javascript finished execution. Here it goes:

First of all, install pywebkitgtk. Under Ubuntu, you can do it directly from the repository:

sudo apt-get install python-webkitgtk libwebkit-1.0-1 libwebkit-dev

…it will attempt to install a lot of other stuff, linked libraries and so on. Just say yes :P
After the installation is complete, it’s generally a good idea to test it! The following code should display a window with Google’s first page in it:

#!/usr/bin/env python
 
import gtk
import webkit
 
window = gtk.Window()
view = webkit.WebView()
view.open('http://www.google.com')
window.add(view)
window.show_all()
window.connect('delete-event', lambda window, event: gtk.main_quit())
 
gtk.main()

…if it doesn’t, maybe you did something wrong. See if all the packages are in their place. For the conversation’s sake, let’s assume it worked move on. As I said in the first paragraph, I wan to load a webpage, wait for it to execute all the JS in it and take the generated HTML source. A strange problem with pywebkitgtk is that nor the WebView object, nor the encapsulated WebFrame object don’t have a “get_html()” method or something similar. Really, there is no clean way to get the site’s content. But, fortunately, on pywebkitgtk’s wiki. I’ve found this hack that does just that:

class WebView(webkit.WebView):
    def get_html(self):
        self.execute_script('oldtitle=document.title;document.title=document.documentElement.innerHTML;')
        html = self.get_main_frame().get_title()
        self.execute_script('document.title=oldtitle;')
        return html

It executes a javascript that takes the content of the whole document and stores it in the title. And since there is a get_title() method that returns the title’s content, this workaround gets the job done. Kind of lame, but it suffices.

As previously stated, in my application I didn’t want to have a browser window open and with GTK is possible to run your app without calling window.show() or window.show_all(). Long story short, this is how I did it:

#!/usr/bin/env python
import sys, threads # kudos to Nicholas Herriot (see comments)
import gtk
import webkit
import warnings
from time import sleep
from optparse import OptionParser
 
warnings.filterwarnings('ignore')
 
class WebView(webkit.WebView):
	def get_html(self):
		self.execute_script('oldtitle=document.title;document.title=document.documentElement.innerHTML;')
		html = self.get_main_frame().get_title()
		self.execute_script('document.title=oldtitle;')
		return html
 
class Crawler(gtk.Window):
	def __init__(self, url, file):
		gtk.gdk.threads_init() # suggested by Nicholas Herriot for Ubuntu Koala
		gtk.Window.__init__(self)
		self._url = url
		self._file = file
 
	def crawl(self):
		view = WebView()
		view.open(self._url)
		view.connect('load-finished', self._finished_loading)
		self.add(view)
		gtk.main()
 
	def _finished_loading(self, view, frame):
		with open(self._file, 'w') as f:
			f.write(view.get_html())
		gtk.main_quit()
 
def main():
	options = get_cmd_options()
	crawler = Crawler(options.url, options.file)
	crawler.crawl()
 
def get_cmd_options():
	"""
		gets and validates the input from the command line
	"""
	usage = "usage: %prog [options] args"
	parser = OptionParser(usage)
	parser.add_option('-u', '--url', dest = 'url', help = 'URL to fetch data from')
	parser.add_option('-f', '--file', dest = 'file', help = 'Local file path to save data to')
 
	(options,args) = parser.parse_args()
 
	if not options.url:
		print 'You must specify an URL.',sys.argv[0],'--help for more details' 
		exit(1)
	if not options.file:
		print 'You must specify a destination file.',sys.argv[0],'--help for more details'
		exit(1)
 
	return options
 
if __name__ == '__main__':
	main()

Download it, try it out. I worked wonders for me and I hope it will prove useful to other people too…

Vim and python

Posted on Monday, May 25th, 2009 under ,

Few months ago, at a wurbe edition, I’ve seen two great editors in action: vim and emacs. At first, I was impressed by Alex Nedelcu’s presentation of emacs and I gave it a try, but I’ve switched to vim soon after, because emacs just…”didn’t feel right”. What I liked about emacs was that could be easily extended and customised to fit the user’s needs with lisp. Alex showed us some scripts made by him to improve his productivity.

I also like to customise my tools, and that what drawn me to emacs in the first place. And when I’ve switched over to vim, I’ve tried to write some custom plugins, but with vim it’s not that simple. Vim uses a built in scripting language, which it’s really weird and badly documented and since it can’t be used anywhere outside of vim, mastering it would be a waste of time. So I’ve postponed the customisation of the editor until I had enough time to look over the vim scripting language. Until now. I’ve recently read an article presenting vim scripting in python, and I’ve decided to give it a try. And it proved to be much simpler than I’ve thought.

First of all, install vim’s python support. If you’re an Ubuntu / Debian user, simply pop this in the console:

tudor@thor:~$ sudo apt-get install vim-python

If not, compile vim from source / use your package manager to install vim with python support. After vim is up and running, create the two files in the ~/.vim/plugin/ directory: my_plugin.vim and my_plugin.py.

The my_plugin.vim should look something like this:

if !has("python")
	call confirm("You must have vim compiled with python in order for this to work", 'OK')
	finish
endif
 
if filereadable($HOME."/.vim/plugin/my_plugin.py")
	pyfile $HOME/.vim/plugin/my_plugin.py
else
	call confirm("Error: my_plugin.py cannot be found! Please reinstall the plugin", 'OK')
	finish
endif
 
"commands for invoking the functions
command! -nargs=1 MyPluginDoSomething python do_something('<args>')
command! -nargs=1 MyPluginDoSomethingElse python do_something_else('<args>')

This is all the vim script you need to know :) And now, in my_plugin.py file, write your plugin’s logic in python, knowing that each vim command will call a function from this file:

def do_something( argument ):
    print 'wassup %s' % argument
 
def do_something_else:
    pass

If you print something in python, it will be shown in vim’s error messages area, at the bottom of the window. But the vim module is available in python allows you to read and write into vim’s buffers and therefore inserting data in the opened document. Type :help python in vim for more details.

I’ve started working on a vim plugin that will aid me with my Zend Framework development. A Zend_Tool integrated with vim that actually does something useful :p

Python – first impression

Posted on Tuesday, May 12th, 2009 under , ,

This is a post I’ve been trying to write for about 2 weeks now. As some of you might know, I’ve spent the previous weeks studying python and writing small scripts and I’ve decided to write a blog entry about it. As a matter of fact, I’ve also looked over the Pylons framework, but I’ll write about it in a another post. So here it is, my opinion about python alone:

What I like about python

Well, I loooooooooove the indentation. I really do. Python made it impossible for lamers to write ugly “one liners”. Everything must be indented and in its place or it won’t even compile (compiling aka no syntax errors as python is an interpreted language). After years of dealing with ugly sources with no braces, no indentation and so on, this feature is like a gift from heavens for me. I really hope it will catch on and be implemented in other languages.

I also like the for in iteration over…well…everything. This code:

for item in collection:
    do_stuff(item)

…works in most cases, even when collection is a file. In which case the loop iterates over the file’s lines. Tuples, dictionaries and lists are cool features.

What I don’t like about python

Of course, there are some things I dislike about this programming language. The first thing is that sometimes is too verbose. Python doesn’t have an post/pre increment operator. You can’t write i++ or ++i, although this code compiles. Further more, it compiles and does nothing, taking the act of debugging to a whole new level of annoyance.

You always have to write i += 1. It also doesn’t have a ternary operator. If you write a = (condition) ? b : c it will give you an compiling error.

Another weak point is its OOP capabilities. Object orientated programming is very strangely implemented in python. A class example in python looks something like this:

class MyClass:
    def __init__(self):
        self.attribute = 'default value'
    def custom_method(self, attribute):
        self.attribute = attribute
    def print_data(self):
        print self.attribute
 
obj = MyClass()
obj.custom_method('wassabi')
obj.print_data()

As you can see, there are no access modifiers (private, protected, public), no instantiation operator (new), the this keyword is replaced by self, and you must write it every single time you define a new method in the class. And python also allows multiple inheritance, which does one thing: annoys people.

Conclusion

Apart from some really annoying “features”, I’m starting to like python. It provides a quick way and pretty clean way to do get things done. And in the end, this is all that matters…Python is cool!