Pylons Execution Analysis

Author: Mike Orr Date: 2006-12-07

Abstract This article analyzes the code execution in the Pylons' QuickWiki tutorial . Indirectly it shows how to build a Python package using egg features. The analysis is based on Pylons 0.9.3, Paste 1.0, setuptools 0.6c3, and other versions noted in section 2.

See QuickWiki in action Install and run QuickWiki according to section 2 of the QuickWiki tutorial. For convenience, here's how I installed it on Linux with SQLite. Create a "~/pylons" directory and cd into it. Create "bin" and "lib" subdirectories. Create a file "setup.cfg" containing: [easy_install] install_dir = ~/pylons/lib script_dir = ~/pylons/bin zip_ok = 0 ("zip_ok = 0" forces packages to be installed as directories rather than zip files, so we can inspect their source code easily.) Create a file "environ.sh" containing: export PATH=~/pylons/bin:$PATH export LIB=~/pylons/lib export PYTHONPATH=$LIB Run "source environ.sh". Later when you want to run QuickWiki again, you'll have to "cd ~/pylons" and run "source environ.sh" to initialize the shell session. If you haven't set up easy_install yet, download ez_setup.py and run it as root. See the Easy Install manual for details. (We won't use the "Custom Install Locations" described in the manual; our "setup.cfg" does the equivalent.) Run "easy_install QuickWiki". This will install several Python packages according to the following dependency graph (the dependency relationships may not be 100% accurate): QuickWiki (0.1.2) Pylons (0.9.3) Beaker (0.6.1) FormEncode (0.6) Myghty (1.1) MyghtyUtils (0.5.2) Paste(1.0) PasteDeploy (1.0) Cheetah 1.0 PasteScript (1.0) Routes (1.5.2) simplejson (1.4) WebHelpers (0.2.2) SQLAlchemy (0.3.1) Important A module pylons.middleware in the "Pylons" egg is located at "~/pylons/lib/Pylons-VERSION-pyVERSION.egg/pylons/middleware.py". Install SQLite if you don't already have it. Run "easy_install pysqlite" (2.3.2). Run "paster make-config QuickWiki quick-wiki.ini". This creates quick_wiki.ini in the current directory. Edit 'quick_wiki.ini'. Change the "sqlalchemy.dburi" line to: sqlalchemy.dburi = sqlite:////PATH/TO/CURRENT/DIRECTORY/quick_wiki.sqlite Don't change the "sqlobject.dburi" line by mistake; it won't be used. If desired, uncomment the "sqlalchemy.echo" line. Comment the "set debug = false" line so we get error tracebacks in the browser. (This is not appropriate for a production system but we're just testing right now.) Run "paster setup-app quick-wiki.ini". This sets up the database tables required by the application. Run paster serve quick-wiki.ini . Point your web browser to "http://localhost:5000/" and add some wiki pages. Note the drag-n-drop AJAX feature for deleting pages. Press ctrl-C on the console when you get bored. Note The tutorial recommends "paster serve --reload quick-wiki.ini". This automatically reloads modules when they are modified so you don't have to restart the server, which is useful in development. However, I found it to leave threads running after I stopped the server, so I got an "address is already in use" error on port 5000 when I started it again. This may depend on the OS. I had to manually find these threads using "lsof -i @:5000" and kill the first one. On these systems it's easier to skip "--reload" and just restart the server manually after making changes. Troubleshooting DeprecationWarning If you get a DeprecationWarning regarding $LIB/QuickWiki*.egg/quickwiki/config/middleware.py, modify it like this to reflect recent changes in Pylons: # @@MO: Deleted due to DeprecationWarning #app = pylons.wsgiapp.PylonsApp(config, helpers=quickwiki.lib.helpers) #g = app.globals # @@MO: Added per DeprecationWarning advice import quickwiki.lib.app_globals as app_globals import quickwiki.lib.helpers app = pylons.wsgiapp.PylonsApp(config, helpers=quickwiki.lib.helpers, g=app_globals.Globals) # @@MO: End of additions. Cheetah ParseError If you get a ParseError from Cheetah, run "easy_install cheetah==1.0". PasteScript's templates are not compatible with the more recent Cheetah 2.0rc7. This probably does not affect which version of Cheetah you can use in your application (@@MO: check this). Alternate databases note SQLAlchemy is not compatible with object databases such as Durus. A companion article Using Pylons with Durus will explain how to use Pylons and Durus together.

Application startup (PasteScript) Note Substitute your actual Python library directory for $LIB, and your program directory for $BIN. If you followed the checklist above, $LIB is "~/pylons/lib" and $BIN is "~/pylons/bin". When you run paster serve --reload quick_wiki.ini, it runs the "paster" executable. This program consists of three short lines that will look unfamiliar to those who haven't started using Python eggs: __requires__ = 'PasteScript==1.0' import pkg_resources pkg_resources.run_script('PasteScript==1.0', 'paster') This says to run a script "paster" located in version 1.0 of an egg "PasteScript". The actual script location is "$LIB/PasteScript-1.0-pyVERSION.egg/EGG-INFO/scripts/paster", but we'll skip the details of how pkg_resources finds it. The "paster" script is short too. It inserts the paste package into sys.path and calls paste.script.command.run() . I added print statements to the paste.script.command module to to decipher it. Here's a simplified description paste.script.command.run() parses the command-line options into a command name "serve" with options ["--reload", "quick-wiki.ini"] . It calls get_commands() , which calls paste.script.pluginlib.load_global_commands()` . This uses a pkg_resources feature: pkg_resources.iter_entry_points("paste.global_paster_command") This reads a file "$LIB/PasteScript-1.0-pyVERSION/EGG-INFO/entry_points.txt". The file is in .ini format and contains among other things: [paste.global_paster_command] serve=paste.script.serve:ServeCommand [Config] ... other commands like "make-config", "setup-app", etc ... This defines a paster command "serve" in class "ServeCommand" in module "paste.script.serve". [@@MO: What's the "Config" part for?] The iter function yields a parse object whose .load() method returns the ServeCommand class. command.run() function then calls invoke() , which effectively does: ServeCommand(["--reload", "quick-wiki.ini"]).run() . This in turn calls ServeCommand.command() , which handles daemonizing and other top-level stuff. Since our command line is short there's no top-level stuff to do. It creates 'server' and 'app' objects based on the config file, and calls server(app) .

Loading the server and application (PasteDeploy) This happens during step 3 of the application startup. We need to find and instantiate the WSGI application and server based on the config file. The application is our QuickWiki application. The server is Paste's built-in multithreaded HTTP server. A simplified version of the code is: # Inside paste.script.serve module, ServeCommand.command() method. from paste.deploy.loadwsgi import loadapp, loadserver server = loadserver(uri="config:quick-wiki.ini", name=None, relative_to="/DIR/CONTAINING/CONFIG/FILE") app = loadapp(uri="config:quick-wiki.ini", name=None, relative_to="/DIR/CONTAINING/CONFIG/FILE") loadserver() and loadapp() are defined in module paste.deploy.loadwsgi . The code here is complex and delves into the details of Python eggs and entry points, so we'll just look at its general behavior. Both functions see the "config:" URI and read our config file. Since there is no server name or app name they both default to "main". Therefore loadserver() looks for a "[server:main]" section in the config file, and loadapp()` looks for "[app:main]". Here's what they find in "quick-wiki.ini": [server:main] use = egg:Paste#http host = 0.0.0.0 port = 5000 [app:main] use = egg:QuickWiki sqlalchemy.dburi = sqlite:////..../quick_wiki.sqlite ... The "use =" line in each section tells which object to load. The other lines are configuration paramaters for that object, or for plugins that object is expected to load. Server loading loadserver's() args are uri="config.quick-wiki.ini", name=None. A "config:" URI means to read a config file. An server name was not specified so it defaults to "main". So loadserver() looks for a section "[server:main]". The "server" part comes from the loadwsgi._Server.config_prefixes class attribute in $LIB/PasteDeploy*.egg/paste/deploy/loadwsgi.py). "use = egg:Paste#http" says to load an egg called "Paste". loadwsgi._Server.egg_protocols lists two protocols it supports: "server_factory" and "server_runner". "paste.server_runner" is an entry point group in the "Paste" egg, and it has a parameter "http". The relevant lines in $LIB/Paste*.egg/EGG-INFO/entry_points.txt are: [paste.server_runner] http = paste.httpserver:server_runner There's a server_runner() function in the paste.httpserver module ($LIB/Paste*.egg/paste/httpserver.py). We'll stop here for a moment and look at how the application is loaded. Application loading loadapp() looks for a section "[app:main]" in the config file. The "app" part comes from the loadwsgi._App.config_prefixes class attribute (in $LIB/PasteDeploy*.egg/paste/deploy/loadwsgi.py). "use = egg:QuickWiki" says to find an egg called "QuickWiki". loadwsgi._App.egg_protocols lists "paste.app_factory" as one of the protocols it supports. "paste.app_factory" is also an entry point group in the egg, as seen in QuickWiki*.egg/EGG-INFO/entry_points.txt: [paste.app_factory] main=quickwiki:make_app [paste.app_install] main=paste.script.appinstall:Installer The line "main=quickwiki:make_app" appears to mean run a function make_app() . There is such a function at the top level of the quickwiki package, imported from quickwiki.config.middleware ($LIB:QuickWiki*.egg/quickwiki/config/middleware.py).

Instantiating the application (QuickWiki) loadapp() calls quickwiki.make_app() , imported from $LIB/QuickWiki*.egg/quickwiki/config/middleware.py: def make_app(global_conf, **app_conf): config = load_environment(global_conf, app_conf) config.init_app(global_conf, app_conf, package="quickwiki") app = pylons.wsgiapp.PylonsApp(config, helpers=..., g=...) app = ConfigMiddleware(app, ...) quickwiki.config.environment.load_environment() returns a pylons.config.Config object containing all our application's configuration information, including Pylons standard path locations. See the Config docstring in $LIB/Pylons*.egg/config.py. The attributes of interest here are: .global_conf Dict representing the "[DEFAULT]" section of the config file. .app_conf Dict representing the "[app:main]" section of the config file. config.init_app() sets up the application's logging and error reporting, and sets some variables for Myghty. It sets config.app_conf["package"] to the 'package' argument, presumably so the application can remind itself what its own top-level package name is. The first app assignment creates a Pylons WSGI application, whose .__call__() method will be called by the WSGI server for each request. This in turn calls .resolve() to determine the controller (presumably using Routes to find a class in QuickWiki), and .dispatch() to create the response. Dispatch calls the controller and returns the response. PylonsApp also sets up the global object 'g' used by the controller, and manages the session and cache. The other app assignments wrap the application in successive layers of middleware. We'll skip looking at these until we analyze an actual request.

Anatomy of a request Let's say you run the demo and choose the "Title List" link. It shows an index of all pages, with a Javascript feature that allows you to drag undesired links to a "delete box". We'll look at how this page was created in Pylons. server(app) is running, called in the ServeCommand.command() method in module paster.serve . server is actually paste.httpserver.serve() , which trivially calls server_runner() in the same module. The use_threadpool arg defaults to true so the actual server is a WSGIThreadPoolServer , which has the following inheritance: SocketServer.BaseServer # In SocketServer.py in Python stdlib. SocketServer.TCPServer BaseHTTPServer.HTTPServer # In BaseHTTPServer.py in Python stdlib. paste.httpserver.SecureHTTPServer # Adds SSL (HTTPS). paste.httpserver.WSGIServerBase # Adds WSGI. paste.httpserver.WSGIServer # Adds multithreading. paste.httpserver.WSGIThreadPoolServer # Adds thread pool. (SSL is not enabled in our example.) Right now the server is waiting for a request, following this call stack: # In paste.httpserver.serve(), calling 'server.serve_forever()' ThreadPoolMixIn.serve_forever() # Defined in paste.httpserver. -> TCPServer.handle_request() # Called for every request. -> WSGIServerBase.get_request() -> SecureHTTPServer.get_request() -> self.socket.accept() # Defined in stdlib socket module. The request comes in and self.socket.accept() returns a new socket for the connection. TCPServer.handle_request() continues. It calls ThreadPoolMixIn.process_request() which puts the request in a thread queue: self.thread_pool.put( lambda: self.process_request_in_thread(request, client_address)) # 'request' is the connection socket. The thread pool is defined in the ThreadPool class. It spawns a number of threads which each wait on the queue for a callable to run. In this case the callable will be a complete Web transaction including sending the HTML page to the client. Each thread will repeatedly process transactions from the queue until they receive a special value ordering them to exit. The main thread goes back to listening for other requests, so we're no longer interested in it. Thread #2 pulls the lambda out of the queue and calls it: lambda -> ThreadPoolMixIn.process_request_in_thread() -> BaseServer.finish_request() -> self.RequestHandlerClass(request, client_address, self) # Instantiates. The actual class is paste.httpserver.WSGIHandler; i.e., the 'handler' variable in serve(). The request handler takes over: SocketServer.BaseRequestHandler.__init__(request, client_address, server) -> WSGIHandler.handle() -> BaseHTTPRequestHandler.handle() # In stdlib BaseHTTPServer.py -> BaseHTTPRequestHandler.handle_one_request() Reads the command from the socket. The command is "GET /page/list HTTP/1.1" plus several HTTP headers. BaseHTTPRequestHandler.parse_request() parses this into attributes .command, .path, .request_version, and .headers. -> BaseHTTPRequestHandler.do_GET() This method is overridden in WSGIHandler, and is actually WSGIHandlerMixin.wsgi_execute(). POST would be the same. -> WSGIHandlerMixin.wsgi_setup() Creates the .wsgi_environ dict. The WSGI environment dict is described in PEP 333, the WSGI specification. It contains various keys specifying the URL to fetch, query parameters, server info, etc. All keys required by the CGI specification are present, as are other keys specific to WSGI or to paricular middleware. The application will calculate a response based on the dict. The application is wrapped in layers of middleware -- nested function calls -- which modify the dict on the way in and modify the response on the way out. The request handler calls the application thus: # In WSGIHandlerMixin.wsgi_execute(), simplified. result = app(wsgi_environ_dict, wsgi_start_response) wsgi_start_response is a callable mandated by the WSGI spec. The application will call it to specify the HTTP headers. The return value is an iteration of strings, which when concatenated form the HTML document to send to the browser. Other MIME types are handled analagously. The application, as we remember, was returned by quickwiki.config.middleware.make_app() . It's wrapped in several layers of middleware, so calling it will execute the middleware in reverse order of how they're listed in $LIB/QuickWiki*.egg/config/middleware.py: The RegistryManager middleware makes certain module globals both thread-local and middleware-local. Pylons and Myghty depend on certain module globals containing the context of the current request. (Defined in the paste.registry module.)

ErrorDocuments intercepts any HTTP error status returned by the application (e.g., "Page Not Found", "Internal Server Error") and sends another request to the application to get the appropriate error page to display instead. (Defined in pylons.middleware .)

Cascade lists a series of applications which will be tried on order. (Defined in paste.cascade .): The first is StaticURLParser (defined in paste.urlparser . It tries to return a file under our static directory ($LIB/QuickWiki*.egg/quickwiki/public/). For QuickWiki this is used only for the stylesheet. If the first application returns "Not Found", the cascader tries the second application, StaticJavascripts (defined in pylons.middleware ). This tries to return a Javascript script in the WebHelpers package (defined in webhelpers ). These include a variety of Javascript tools including a full port of the Ruby on Rails utilties. See the WebHelpers site for details. If that returns "Not Found" too, the cascader falls back to the third application, your QuickWiki app. But there's still some other middleware wrapped around the app....

ErrorHandler sends a nice helpful traceback to the browser if the app raises an exception. It's active only if "debug" is true in the config file. (Defined in pylons.middleware .)

httpexceptions (defined in paste.httpexceptions ) converts HTTP exceptions raised into proper WSGI responses.

User-defined middleware. If the user added any middleware in make_app() it would be executed here.

ConfigMiddleware (defined in paste.deploy.config ) makes a paste.config key in the WSGI environ dict that contains the effective values in the config file. This is a dict containing the merger of the "[app:main]" and "[DEFAULT]" sections of the config file.

The innermost middleware calls the PylonsApp instance it was initialized with.

Surprise! PylonsApp is itself middleware. It calls the pylons.wsgiapp.PylonsBaseWSGIApp instance in its self.app attribute. PylonsBaseWSGIApp is a middleware too. Its .__call__() method does: self.setup_app_env(environ, start_response) controller = self.resolve(environ, start_response) response = self.dispatch(controller, environ, start_response) return response .setup_app_env() modifies the environ dict. .resolve() calculates the controller class using Routes, our package name ("quickwiki"), and a routing map defined in quickwiki.config.routing . Here's QuickWiki's routing map (simplified): map.connect('error/:action/:id', controller='error') map.connect(':controller/:action/:title', controller='page', action='index', title='FrontPage') map.connect(':title', controller='page', action='index', title='FrontPage') map.connect('*url', controller='template', action='view') The first arg is a URL pattern to match. "Controller" is a class in a same-name module under quickwiki.controllers . (Pylons converts the URL to TitleCase and "/" to ".".) The action is a method in that class. The keyword arguments are variables which will be available to the controller. ":var" means take the variable's value from the URL. If the URL is too short to contain that part, the keyword arguments provide defaults. Routing maps can contain other features not covered here. .resolve() invokes Routes and gets back a match dict for the requested URL. The match dict for our URL "/page/list" is: {'action': 'list', 'controller': 'page', 'title': 'FrontPage'} Note how it matched the second routing rule, and the title defaulted to the keyword arg. .resolve() puts the match dict under WSGI environ key "pylons.routes_dict". Then it looks for a module quickwiki.controller.page , converting any "/" in the 'controller' key to ".". It imports this module and looks for a class PageController inside it, using pylons.util.class_name_from_module_name() to TitleCase the class name to TitleCase and replace any "-" with "_" (defined in ). Hint Put "print map" in pylons.wsgiapp line 176 (in PylonsBaseWSGIApp.resolve() after "match = config.mapper_dict" to see the routing variables for every request. .dispatch instantiates the controller class and calls in the WSGI manner. If the controller does not exist ( .resolve() returned None), raise HTTPNotFound. quickwiki.controllers.page.PageController does not have a ``.__call__() method so control falls to its grandparent, pylons.controllers.WSGIController . It looks up the action method .index() defined in PageController and calls it. The action method may have any number of positional arguments as long as they correspond to variables in the routing rule. In addition, the c global will contain all variables as attributes. If an action method name starts with "_", it's private and HTTPNotFound is raised. If the controller has .__before__() and/or .__after__() methods, they are called before and after the action, respectively. These can perform authorization, lock OS resources, etc. The action method returns a pylons.Response object (imported from paste.wsgiwrappers.WSGIResponse ). We'll look at action methods more closely in Pylons actions tips. WSGIController.__call__() continues, converting the Response object to an appropriate WSGI return value. (First it calls the start_response callback to specify the HTTP headers, then it returns an iteration of strings. The Response object converts unicode to utf-8 encoded strings, or whatever encoding you've specified in the config file.) The stack of middleware calls unwinds, each modifying the return value and headers if desired. The server receives the final return value. (We're way back in paste.httpserver.WSGIHandlerMixin.wsgi_execute() now.) The outermost middleware has called back to server.start_response() , which has saved the status and HTTP headers in .wsgi_curr_headers . .wsgi_execute() then iterates the application's return value, calling .wsgi_write_chunk(chunk) for each encoded string yielded. .wsgi_write_chunk() formats the status and HTTP headers and sends them on the socket if they haven't been sent yet, then sends the chunk. The convoluted header behavior here is mandated by the WSGI spec. Control returns to BaseHTTPRequestHandler.handle() . .close_connection is true so this method returns. The call stack continues unwinding all the way to paste.httpserver.ThreadPoolMixIn.process_request_in_thread() . This calls SocketServer.close_request() , which does nothing. The request lambda finishes and control returns to ThreadPool.worker_thread_callback() . It waits for another request in the thread queue. If it receives a special shutdown value, thread #2 dies.

What's in the QuickWiki egg? The QuickWiki-0.1.2-py2.4.egg contains two subdirectories: quickwiki and EGG-INFO. quickwiki This is a Python package, so every directory contains an __init__.py file. The egg is on sys.path so import quickwiki will access the package. A Pylons application normally contain one package named after the application, but there's no reason it can't contain additional packages or top-level modules. The "setuptools" egg for instance contains a packge setuptools and a module pkg_resources . QuickWiki contains the standard subpackages of a Pylons application: __init__.py (module) Just an import to provide quick_wiki.make_app() at the top level. config The middleware and routing modules we've seen above. environment sets the paths and does Myghty configuration. controllers Contains controller modules for the application. These are connected to URLs via the routing map. docs A place for documentation files. As "index.txt" explains, write your docs in ReSTructured Text format and use Pudge to make nice HTML documentation from both the ReST files and your Python docstrings. i18n gettext translation modules for multilingual applications. lib Miscellaneous modules. app_globals initializes the g global -- empty by default. base contains the BaseController class. helpers initializes the h global, which by default contains all the features of the webhelpers package plus some multilingual and logging goodies from pylons.util . models Your data model. Typically these are SQLAlchemy or SQLite classes modelling database tables. public Static files that should be served as-is to the client. The included "quick.css" is accessed via URL "http://localhost:5000/quick.css". Static files override controller actions with the same name. templates Template modules used by the controllers. Pylons uses the Myghty template system by default, but it can also be configured for Cheetah or another template language. tests A place to put your unit tests. websetup (module) Invoked by "paster setup-app". Typically creates database tables and does any other initialization necessary before running the app or the first time. EGG-INFO Metainformation about the egg, and supplemental files used by the application. dependency_links.txt @@MO ??? entry_points.txt Services this egg provides that are of use to other eggs. QuickWiki's contains: [paste.app_factory] main=quickwiki:make_app [paste.app_install] main=paste.script.appinstall:Installer This tells how to instantiate a QuickWiki application and how to install it. As we saw above, paste.deploy wants to instantiate an app "main" contained in QuickWiki (related to the "[app.main]" section in our config file "quick-wiki.ini"). It uses pkg_resources to look for an entry point "main" in group "paste.app_factory" in QuickWiki. This returns quickwiki.make_app , or at least conceptually it does. See paste.deploy.loadwsgi for the actual calling code. not-zip-safe An empty file that tells "easy_install" never to install this egg as a zip file. The opposite is a file "zip-safe". paste_deploy_config.ini_tmpl An example of a data file. This one contains the template for new app config files. It's written in .ini format with %-operator substitutions. paster_plugins.txt The names of other eggs that contain commands for the "paster" program. PKG-INFO Metanformation about the package in "key: value" format. Package name, version, summary text, home page, author, author's email, license, etc. This information is used when registering the package in the Python Cheeseshop. (Here's the Cheeseshop's QuickWiki entry for example.) requires List of packages and versions QuickWiki depends on. SOURCES.txt List of files used in making this egg. top_level.txt Name of QuickWiki's top-level package ("quickwiki").