...making Linux just a little more fun!

WSGI Explorations in Python

By Mike Orr (Sluggo)

Contents

Introduction

WSGI has become a buzzword among Python developers, especially since the PyWebOff discussed in my last article. The burgeoning proliferation of web application frameworks -- once a testament to how easy it is to build one in Python -- is now seen by many as a liability. The fifty-odd frameworks are non-interoperable for the most part and leave the new user scratching his head wondering which one to use. WSGI attempts to address the interoperability problem by providing a common protocol between frameworks and servers. A lot of related work has started over the past year, but it's mainly been by individuals working alone quietly. There hasn't been a central place to get an overview on what's available, and the packages themselves often lack the documentation and demos necessary to survey them all in a reasonable amount of time. One bright spot is Paste (formerly WSGIKit), which is positioning itself as a meta-framework and has seen significant development over the past few weeks. This article attempts to bring all this together and provide an overview of WSGI activity circa May 2005.

The article changed direction at least three times while I was writing it because the WSGI landscape is changing so rapidly. My original modus operandi was to get the Quixote Altdemo running in the widest variety of WSGI environments. (The Altdemo is a one-file application demonstrating logging in, sessions, and displaying the request context. Quixote is our reference point because I like its philosophy and am currently using it.) I started exploring, putting my notes into the article as I went so they wouldn't get lost. The first wall I hit was the lack of a WSGI interface for Quixote. (I'll explain all these words later.) This led to the discovery of Titus Brown's QWIP (the missing interface), and my detour to update QWIP and embed it in a server module for Quixote (wsgi_server.py). Then I put aside the article for a few days, grew frustrated at how long the not-yet-written sections were threatening to be, and finally realized Paste is where I should concentrate my exploration, even though I'm not yet ready to commit to Paste for my own projects. So the article finally morphed from a hands-on HOWTO for comparing projects into a bird's-eye summary of what's available.

Frameworks

First let's take a very brief overview of the differences between the frameworks. Zope stands alone as the Emacs of web application environments. It's very comprehensive but you have to do things "the Zope way", which is very different from normal Python programming. Zope predated most of the other frameworks, and many of them are a reaction to it. Nevertheless, several advanced add-ons have been written for Zope including the Plone Content Management System.

Webware (or rather its WebKit component) is based on the Java servlet model. Python modules coexist with static files (images, multimedia) in the filesystem. The URL resolver finds foo.py if there's no closer match for "/foo". The module must contain a same-name subclass of Servlet . Various methods are called to produce the output; various other method provide context information (GET/POST variables, session variables) and support services (redirects).

Quixote and CherryPy resolve URLs by searching object attributes rather than the filesystem. So you provide a root object, and "/foo/bar/baz" maps to ROOT.foo.bar.baz . Context information comes from global imports that "know about" the current request; this is not very OO but it keeps the application code remarkably clean. A URL directory maps to a class, and a leaf node to a method.

Quixote 2.0 goes a step further by giving you complete control over the URL resolver. All Quixote does is call ROOT._q_traverse(url_components) , and the default implementation works as above, calling ._q_traverse on "subdirectories" as necessary. But you can override it to do anything you want, including handling all URLs another way, diverting to a login form, checking authorization, calling leaf nodes with additional arguments, wrapping a header/footer around the output, etc. Many of the "Why doesn't Quixote do it this way?" questions are answered with, "You can do it in ._q_traverse .

Twisted is more of an application-and-server framework rather than just a web application framework, but it does include the minimal twisted.web upon which Nevow is based. Both use the Directory object model, with a .locateChild method handling all URLs under it.

Minimalist frameworks provide only the minimum required for web applications: a URL resolver, access to context variables, session management, handling HTTP headers and errors, and redirection. The URL resolver more or less determines the structure of your application. Richer frameworks provide a complete Model-View-Controller structure, a template system, a database access system, a Form object of some sort, HTML generation functions, a configuration file, and/or a Javascript library.

Applications need a template system to merge calculated values into predefined HTML. Most of these (Zope Page Templates, Nevow's) use XML-style tags for the variable placeholders, and are thus limited to producing HTML/XML output. Cheetah uses a $placeholder and #command syntax, making it suitable for other types of output as well. Quixote's PTL system turns templating on its head: the template is an ordinary Python function, except that standalone expressions are concatenated and used as the implicit return value. PTL also escapes "unsafe" values automatically -- those coming from a non-trusted source (arguments or function calls) that may contain unexpected HTML markup. There's an htmltext class to mark a string as "safe" to prevent further escaping. (There are examples of PTL and Nevow in my PyCon 2004 article.) Python 2.4 includes a built-in template system using $placholder syntax, but it's so new and limited I haven't seen anyone using it. Most template systems are affiliated with a certain framework but can also be used standalone.

Another difference between the template systems is where the plug-in values come from. For Python's, you provide a dictionary. For Cheetah, you can provide a list of dictionaries/objects to search in order, or you can set template attributes directly. For Nevow you provide a function that sets the values in the template (really!), and additional functions for looping. PTL gets values from the function arguments or lexical scope. Fortunately, none of these systems require a bunch of .setValue(key, value) calls like certain Java/PHP/Perl templates!

Applications also need to access a database. Some use object databases like ZODB or Durus, others use SQL directly, others use an object-relational mapper to pretend they're not using a SQL database. Of course, the object-relational mappers are non-interoperable; perhaps that is the next frontier for standardization in Python. There are also various SQL-generation modules that provide a thin wrapper over the syntax. (I've written one of those, dbutil.py.)

Most frameworks come with a Python web server either recommended for production or used just for testing. All connect to Apache via CGI, and some via a generic Apache module (FastCGI, mod_python) or a custom Apache module (mod_scgi, mod_webkit). Some can run under Twisted. Because web requests can arrive simultaneously, servers have to deal with concurrency. There are four concurrency models:

Synchronous servers handle one request at a time. Any subsequent requests just have to wait. (Python's SimpleHTTPServer, Quixote's simple_server)

servers handle one request at a time. Any subsequent requests just have to wait. (Python's SimpleHTTPServer, Quixote's simple_server) Multiprocess servers handle each request in a subprocess. (Apache CGI, mod_python, mod_scgi) Each subprocess handles one request or several synchronously, but they all run in parallel. A subprocess can handle the request itself or transmit it to a long-running server via a socket.

servers handle each request in a subprocess. (Apache CGI, mod_python, mod_scgi) Each subprocess handles one request or several synchronously, but they all run in parallel. A subprocess can handle the request itself or transmit it to a long-running server via a socket. Multithreaded servers handle each request in a thread. (Webware, Zope, WSGI Utils' httpServer)

servers handle each request in a thread. (Webware, Zope, WSGI Utils' httpServer) Asynchronous servers use select to multiplex I/O between several pending requests. (Twisted, Medusa)

Each model has its advantages and disadvantages. Synchronous servers are too slow for most production systems. Multithreaded servers require careful programming to avoid clobbering shared variables. Some frameworks are not thread safe anyway. Twisted uses a non-linear programming model to avoid threads; its Deferreds and callbacks are enough to make many programmers run screaming from the room. The multiprocess model avoids clobbering shared variables but can still clobber shared files. In many cases (especially with Apache) the webserver adapter does not calculate the response itself but instead forwards the request to a separate application server. The application server may have a different concurrency model than the web server.

All frameworks provide session management, each in its own way. A session is a dictionary or object that is shared between requests by the same user as long as the user doesn't quit his/her browser or remain idle too long. A session manager stores sessions and gives the correct session to each request. Sessions may be stored in a dictionary, files, or a database. A dictionary session manager requires a single long-running process (not multiprocesses), and the sessions vanish when the server quits or crashes. File-based sessions persist between server instances and can be shared by multiprocesses, but file locking is required to prevent simultaneous updates. Database sessions are similar but the database server often handles the multiplexing for you.

Each framework also has a way for managing forms. Some have a form object and widget objects; others use an XML description. Quixote uses the former. The form object can both render a form, validate it, and redisplay it with error messages. With some frameworks you get the form input and validate it against the widgets. In Quixote, you ask the widgets to find themselves in the GET/POST variables, validate themselves, and return the value or None .

Pre-WSGI Standardization

A few attempts at standardization were made before WSGI appeared. For instance, frameworks come with their own web servers and adapters for Apache, and these are generally not interchangeable with other frameworks. Webware comes with mod_webkit, a module that quickly transmits the request to a Webware application server. Neil Schemenauer, a Quixote developer, aimed to do this in a generic way and wrote SCGI. It serializes the request environment variables and input, sends it through a socket, and expects a complete HTTP document (with headers) in reply. His mod_scgi does this in an Apache module, and his cgi2scgi.c does it in a CGI script. I wrote a Python equivalent, cgi2scgi.py, which is useful for testing. Titus Brown wrote SWAP, a SCGI-WSGI gateway. It's included in Paste, or you can get it from his QWIP and SWAP page.

Subway aggregates CherryPy, Cheetah, SQLObject, and now Paste into an integrated development environment similar to Ruby on Rails. It uses CherryPy's WSGI interface to link into Paste.

Credit should also be given to the Zope team for unbundling their ZODB database and template systems so they can be used with other frameworks.

WSGI

WSGI is Python PEP 333, the Web Server Gateway Interface. It's a a protocol for communicating with Python web applications. WSGI works by callbacks. The application provides a function which the server calls for each request:

application(environ, start_response)

environ is a Python dictionary containing the CGI-defined environment variables plus a few extras. One of the extras is "wsgi.input", the file object from which to read the POST variables. start_response is a callback by which the application returns the HTTP headers:

start_response(status, response_headers, exc_info=None)

status is an HTTP status string (e.g., "200 OK"). response_headers is a list of 2-tuples, the HTTP headers in key-value format. exc_info is used in exception handling; we won't cover it here.

The application function then returns an iterable of body chunks. In the simplest case this can be:

["<html>Hello, world!</html>"]

Getting slightly more elaborate, here's the second-smallest WSGI application in the world:

def app2(environ, start_response): start_response("200 OK", []) s = "<html>You requested <strong>%s</strong></html>" s %= environ['PATH_INFO'] return [s]

The protocol may look strange, but it's designed to meet the needs of the widest possible variety of existing and potential frameworks and servers. And middleware. Middleware are reusable components providing generic services normally handled by frameworks; e.g., a Session object, a Request object, error handling. They're implemented as wrapper functions; i.e., decorators. Inbound they can add keys to the dictionary (e.g., quixote.request for a Quixote-style Request object). Outbound they can modify HTTP headers or translate the body into Latin or Marklar. Here's a small middleware:

class LowercaseMiddleware: def __init__(self, application): self.application = application # A WSGI application callable. def __call__(self, environ, start_response): pass # We could set an item in 'environ' or a local variable. for chunk in self.application(environ, start_response): yield chunk.lower()

Assuming we had a server constructor Server , we could do:

app = LowercaseMiddleware(app2) server = Server(app)

Since it's so easy to write a WSGI application, you may wonder, "Who needs a framework?" That's a legitimate question, although the answer is, "It's tedious without one." Your application is responsible for every URL under it; e.g., if it's installed as http://localhost:8080/, it would have to do something intelligent with http://localhost:8080/foo/bar/baz. Code to parse the URL and switch to an appropriate function is... a framework! So you may as well use an existing framework and save yourself the tedium.

Writing a WSGI server interface is more complex. There's an example in PEP 333. I wrote an object-oriented one for Quixote (in wsgi_server.py). But the experience taught me it's more fun to write the application side.

WSGI opens the way for a lot of interesting possibilities. Simple frameworks can be turned completely into middleware. Some frameworks might be able to run on top of other frameworks or even be emulated by them. Ideally, existing applications would run unchanged or with minimal changes. But this is also a time for framework developers to rethink how they're doing things and perhaps switch to more middleware-friendly APIs.

Currently, CherryPy and Nevow have WSGI interfaces. Twisted's CVS has a twisted.web2.wsgi module. Quixote has QWIP.

Why WSGI Won't Replace SCGI

Many people think WSGI will replace mod_scgi and all the other webserver adapters. This is partly due to the confident language in the PEP about connecting to "web servers". But both sides of WSGI must be in the same process, for the simple reason that the spec requires an open file object in the dictionary, and you can't pickle a file object and transmit it to another process. So a WSGI call from an Apache module to an embedded Python application is possible, but a WSGI call from Apache to a long-running application server is not. Yet these long-running servers are necessary to maintain state and factor out the initialization overhead. SCGI can replace the framework-specific adapters since it is serializable and framework neutral (and programming-language neutral), but WSGI will have to operate on the "application side" of an SCGI interface.

Conversely, people may wonder why to use WSGI when SCGI is both a gateway and serializable. However, inbound processors would have to parse and reformat the entire input stream instead of merely passing a dictionary object through, and outbound processors would likewise have to parse the headers.

Python web servers, however, can take the day off because they have to provide only one interface now, WSGI. Then any compliant application can be plugged into them regardless of framework. Since the web server is a long-running process, the application will initialize itself only once, and it can store any needed state in module globals (but watch out for threading issues). And I suppose a WSGI application could be a stub that calls a standalone application server via SCGI...

Quixote's Challenges to WSGI

Let's look at the specific challenges one framework has in adapting to WSGI. QWIP exists to connect the monolithic Quixote Publisher to WSGI, but what if we want to factor out parts of Quixote to middleware? Given that the URL resolving is already factored out to your Directory object, if you take out the session handling, error handling, and request handling, is there much of a Publisher left? Do you use generic middleware, which would require people to change their applications, or middleware that produces Quixote-style objects? Three considerations stand out:

Because Quixote makes it easy to subclass Publisher, Session and Request, many applications do. These applications are apparently incompatible with some middleware.

Applications access their context objects by calling global functions: e.g., quixote.get_session() . These functions are frequently used in lieu of passing arguments. Middleware puts context objects in the environment dictionary. A middleware'd Quixote may have to create a fake quixote module with functions that read the environment dictionary, or maybe the middleware would have to stuff the objects into a fake quixote module itself. There is precedent for this in Paste, whose webkit component provides a fake Webware environment for a servlet to run in.

. These functions are frequently used in lieu of passing arguments. Middleware puts context objects in the environment dictionary. A middleware'd Quixote may have to create a fake module with functions that read the environment dictionary, or maybe the middleware would have to stuff the objects into a fake module itself. There is precedent for this in Paste, whose webkit component provides a fake Webware environment for a servlet to run in. Most frameworks use a dictionary for the session object, but Quixote uses an instance, and users often add attributes. One predefined attribute is .user , although Quixote does nothing with it except initialize it to None . But applications frequently set it to something and access it via quixote.get_user() . Is it better to change applications to use dictionary sessions, or provide a Quixote-style session middleware? Or can we use a generic session manager middleware with our own session objects?

, although Quixote does nothing with it except initialize it to . But applications frequently set it to something and access it via . Is it better to change applications to use dictionary sessions, or provide a Quixote-style session middleware? Or can we use a generic session manager middleware with our own session objects? Quixote is not thread safe, so my version of QWIP refuses to connect to a multithreaded server. There's a class and a wrapper in wsgi_server.py to make it thread safe, but they need further testing.

WSGI Utils

WSGI Utils provides:

wsgiServer, a multithreaded HTTP-to-WSGI server based on Python's SimpleHTTPServer. It takes a mapping of URL prefixes to applications, allowing you to serve multiple applications from one server, and can also overlay a static directory for any non-matching URLs.

wsgiAdapter, a simple application framework. It provides Basic Authentication and signed cookies.

some persistent session classes using DBM and an optional daemon.

Paste has an interface to wsgiServer, as does my wsgi_server.py. I had some trouble with the application mapping. It works fine with a single application but is not very robust for multiple applications. One wonders whether attaching multiple applications at this level is really that useful anyway, but time will tell. wsgiAdapter may be an interesting framework to build other frameworks on top of, although that may require more modification to existing frameworks than it's worth.

One caveat about the static overlay, which is inherited from Python's SimpleHTTPServer : it serves from the current directory, and there's no provision to specify another directory. So make sure to chdir to the static directory before launching wsgiServer.

Other packages

Flup is Allan Saddi's package of WSGI utilities. There are gateways to SCGI and FastCGI, both threaded and forking. There are middlewares for error handling, gzip compression, and sessions. The session object is a dictionary subclass, and the session managers are memory-, shelve-, and file-based. There's also a minimal application framework. (The more the merrier!)

PEAK is a library for enterprise applications. Among its diverse offerings is wsgiref, a reference library for WSGI including a simple HTTP server. wsgiref seems to have spun off into its own repository, located here.

Paste

Paste is what I call a meta-framework, a way to plug together frameworks and servers and middleware using a common configuration system. Here's how Paste's creator, Ian Bicking, describes it: http://pythonpaste.org/docs/what-is-paste.html

Paste has been getting an extraordinary level of development recently. It includes two frameworks: a WSGI-aware implementation of Webware, and a (backward-incompatible) modernization of that called Wareweb. Paste is becoming the Borg, assimilating some third-party code and linking to others. It has incorporated gateways from SWAP and Flup, and Quixote support is in progress. Subway has refactored itself to work with Paste. Paste should also work with any generic WSGI application. It has been creating and borrowing middleware left and right, for configuration, error handling, caching, testing, authentication, redirects, session management, HTML validation, and no doubt others.

Paste has a top-level executable 'paster'. If the first argument is "serve", it launches the server/application/middleware combination specified in a Python-syntax configuration file. Some of the configuration parameters are:

server : "wsgiutils", "scgi", "scgi_flup_fork", "scgi_flup_threaded", "console", etc. The console server is for debugging and dumps the response on the screen. Run 'paster serve --list-servers' to see all servers with descriptions.

: "wsgiutils", "scgi", "scgi_flup_fork", "scgi_flup_threaded", "console", etc. The console server is for debugging and dumps the response on the screen. Run 'paster serve --list-servers' to see all servers with descriptions. publish_app : a generic WSGI application. This can be a function object, or a string naming the object to import. This overrides the "framework" and "publish_dir" parameters.

: a generic WSGI application. This can be a function object, or a string naming the object to import. This overrides the "framework" and "publish_dir" parameters. framework : "webkit", "wareweb", "subway", etc. The default uses webkit.

: "webkit", "wareweb", "subway", etc. The default uses webkit. publish_dir: a directory containing a framework-specific application. (Only used by certain frameworks.)

The configuration is read into a dictionary and available to all parts of the runtime. This means frameworks can define additional parameters, and so can middleware and your application. This allows you to put all configuration information in one place with one format. For instance, a session middleware might need to know which directory to save to, and your application might need to know which database to connect to.

If the first argument is "create", paster creates a stub directory for a new application. You don't have to do this -- you can build your own application directory from scratch -- but paster has skeletons for several common scenarios. For instance:

$ paster create --template=webkit_zpt /tmp/paste_app.py $ ls /tmp/paste_app.py __init__.py server.conf sitepage.pyc web/ __init__.pyc sitepage.py templates/

This creates a Webware application using Zope Page Templates and SQLObject. Here's what it contains:

__init.py__ : Makes this directory a Python package so you can import it.

: Makes this directory a Python package so you can import it. server.conf : The configuration file.

: The configuration file. sitepage.py : Class SitePage , the superclass of your servlets.

: Class , the superclass of your servlets. templates : Put your ZPT templates here. By default you get a generic home page, a standard header/footer, and a basic error page.

: Put your ZPT templates here. By default you get a generic home page, a standard header/footer, and a basic error page. web: The published directory. Put your servlet and static files here. The default home page lists the WSGI dictionary for the request.

Here's the server.conf file created by the above command (comments deleted):

import os app_template = 'webkit_zpt' app_name = 'paste_app' framework = 'webkit' publish_dir = os.path.join('/tmp/paste_app', 'web') sys_path = ['/tmp'] verbose = False server = 'wsgiutils' reload = True debug = True

Obviously I didn't choose a good location since you don't want /tmp in your Python path, but you get the idea. (Note: don't call your application directory "paste" or Python's import mechanism will get confused.)

Paste had Twisted support but it removed due to bugginess and obsolescence. New code for twisted.web2.wsgi (not released yet; in Twisted CVS only) has not been written.

QLime

QLime is a rich framework built on top of Quixote. I'm mentioning it because it has an intriguing configuration system, one that might be useful for Paste or another meta-framework. QLime uses a global "registry" reg , which is just a container for arbitrary attributes. reg.site is the published application; i.e., a Quixote Directory and its sub-Directories. reg.oc contains table classes for QLime's object-relational mapper, and reg.ds contains database connection objects. But you can set any attributes you want, such as reg.skin.red and reg.skin.blue , which might be instances describing user-selectable color themes.

The registry is built from a configuration file in Windows .ini format:

[site qlime.registry.SiteObject] [site.colorhello colorhello.Hello] skin_name=skin.red [skin.red colorhello.Red] [skin.blue colorhello.Blue] [oc.note class:qlime.demo.noteapp.Note] ds=notedb oc=default

Each [] line is a section header. The first word in the header tells where in the registry to attach this object. The second word names the class to instantiate. Any key=value pairs are arguments to the constructor (strings only). If the second word begins with "class:", it attaches the class itself rather than an instance. In that case, the arguments are passed to a special class method, ._q_class_init . So the above configuration corresponds roughly to:

from qlime.registry import SiteObject from colorhello import Hello, Red, Blue from qlime.demo.noteapp import Note reg.site = SiteObject() reg.site.colorhello = Hello(skin_name="skin.red") reg.skin.red = Red() reg.skin.blue = Blue() reg.oc.note = Note # Class, not instance! Note._q_class_init(ds="notedb", oc="default")

The web-sig

Python's web-sig is a mailing list for coordinating the various web-related projects. Discussion topics include how to improve the Python standard library for HTTP/HTML/framework support, and what third-party packages are needed and how they should be structured. Of course the cabal cannot tell a developer what to do, but those interested in standardization will look to the group's consensus, and the list is also a resource for questions like "Is anybody else working on X?" and "How do I do Y?" The list is pretty active, and most of the bigwigs are on it. So this is a good place to coordinate WSGI work.

Recent topics include making a Javascript library a la Nevow's, an AJAX framework, what Rails got that we don't got, porting frameworks to Paste, etc.

Conclusion

WSGI is only part of the solution. It helps interoperability but does nothing for the Python newbie who just wants to get a simple dynamic site up and running quickly and is overwhelmed by the choices. Documentation is what the newbie needs, and the Python community needs to come to an agreement regarding the top framework(s) to recommend, or perhaps the top framework for each purpose (simple site, large site, heavily-loaded site). Paste plays a paradoxical role. On the one hand it encourages people to write even more frameworks to experiment with the expanding possibilities. On the other hand, this new code will be designed with interoperability in mind, something we haven't had before. And perhaps instead of compromising on one framework and giving up our beloved esoteric features, we can compromise on this meta-framework and keep our features, and Johnny Newbie will be less confused.

Next month I'm planning an article on User-Centered Design and Usability Testing, and maybe the following month I'll have more to report on Paste.

Mike is a Contributing Editor at Linux Gazette. He has been a Linux enthusiast since 1991, a Debian user since 1995, and now Gentoo. His favorite tool for programming is Python. Non-computer interests include martial arts, wrestling, ska and oi! and ambient music, and the international language Esperanto. He's been known to listen to Dvorak, Schubert, Mendelssohn, and Khachaturian too.

