Ashd — A Sane HTTP Daemon

Ashd is a modular HTTP server based on a multi-program architecture. Whereas most other HTTP servers are monolithic programs with, perhaps, loadable modules, Ashd is composed of several different programs, each of which handles requests in different ways, passing requests to each other over a simple protocol (not unlike Unix pipelines). The design of Ashd brings it a number of nice properties, the following being the most noteworthy ones.

Sanity of design The separation of concerns between different, independent programs is an example of standard Unix philosophy – each program does one thing only, but does it well (I hope). The clean delineation of functions allows each program to be very small and simple – currently, each of the programs in the collection (including even the core HTTP parser program, htparser , as long as one does not count its, quite optional, SSL implementation) is implemented in less than 1,000 lines of C code (and most are considerably smaller than that), allowing them to be easily studied and understood. Security Since each program runs in a process of its own, it can be assigned proper permissions. Most noteworthy of all, the userplex program ensures that serving of user home directories ( /~user/ URLs, if you will) only happens by code that is actually logged in as the user in question; and the htparser program, being the only program which speaks directly with the clients, can run perfectly well as a non-user (like nobody ) and be chroot'ed into an empty directory. Configuration sanity Again, since each program only handles a simple task, its configuration can be made quite simple. There is no need for the dirplex program, which only handles service from physical directories, to care about virtual directories, virtual hosts, HTTP protocol parameters or authentication; just as there is no need for the patplex pattern matcher to know about file types or directory hierarchies. Each program's configuration file format can be kept as simple as possible, and indeed most programs lack configuration files entirely and are configured simply with command-line options. Persistence Though Ashd is a multi-process program, it is not in the same sense as e.g. Apache. Each request handler continues to run indefinitely and does not spawn multiple copies of itself, meaning that all process state persists between requests – session data can be kept in memory, connections to back-end services can be kept open, and so on.

Current Status

Ashd can be said to be rather mature by now. Having tested it on moderately busy sites (see the Performance section below for an example), no crashes or other signs of instability have been observed over months of continuous operation, and it has not displayed any problems with any particular user-agents. It does lack a few features present in other HTTP servers, but nothing that I, for one, have experienced as a problem; and it also supports a few features not always present in other servers (such as chunked request-bodies).

Design Overview

Though the server as a whole is called "Ashd", there is no actual program by that name. The htparser program of Ashd implements a minimal HTTP server. It speaks HTTP (1.0 and 1.1) with clients, but it does not know the first thing about actually handling the requests it receives. Rather, having started a handler program as specified on the command-line when started, it packages the requests up and passes them (with Unix socket file-descriptor passing) to that handler program. That handler program may choose to only look at part of the URL and pass the request on to other handler programs based on what it sees. In that way, the handler programs form a tree-like structure, corresponding roughly to the URL space of the server. In order to do that, the packaged request which is passed between the handler programs contains the part of the URL which remains to be parsed, referred to as the "rest string" or the "point" (in deference to Emacs parlance).

For an actual, technical description of the architecture and protocols, see the ashd(7) manpage.

Example

As a concrete example, here is how the request to /~fredrik/ashd/index is handled by this particular server.

The request is received over HTTP by htparser . It sets the rest string to ~fredrik/ashd/index and passes it to the patplex process that it was instructed to start by way of command-line argument. The patplex program, instructed by its configuration file, recognizes the initial tilde of the rest string, strips it off, and passes the request to the userplex program. If userplex is not already running, it starts it, passing the control socket (over which requests are passed) on its standard input, with command-line arguments as specified in the patplex configuration file. The rest string at this point being fredrik/ashd/index , the userplex program strips off the rest string until the first slash, treating the stripped-off part as a username, fredrik . Having done some tests (as configurable with command-line options) to determine that the username is valid, it checks to see if it has a request handler already running for that user. If not, it forks off, logs in as the user in question and starts a request handler. The request handler can be explicitly provided by the user by creating an executable file named ~/.ashd/handler , but is otherwise started as specified on userplex 's command line; normally, and in this case, an instance of the dirplex program. The dirplex program receives the request with the rest string set to ashd/index . Having been instructed (by way of command-line arguments) to handle the physical directory ~/htpub , it starts chipping off slash-separated elements of the rest string. Starting with the ashd element, it finds a directory under htpub with that name, and interprets the next element, index , relative to it. Finding no entry by that exact name, it looks more thoroughly and finds index.html instead. Having found the physical file ~/htpub/ashd/index.html , it does pattern matching on that physical filename according to its configuration, finding that it should fork out the sendfile program to handle the request. The sendfile program handles the request by sending the file contents exactly as they are back to htparser over the socket passed between the various handler programs, and then, not being a persistent program, exits. The only thing sendfile does with the rest string is to check that it is now empty. htparser itself takes care of any chunking or other transfer encoding that might be necessary for HTTP keep-alive.

"Screenshot"

$ ps -AH lS F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 1 65534 2216 1 20 0 24628 908 ? Ss ? 1:54 /usr/local/bin/htparser -Sf -p /var/run/ashd.pid -u nobody -r /var/tmp plain -- errlogger -n ashd patplex /usr/local/etc/ashd/rootpat 0 0 2215 1 20 0 3904 512 ? Ss ? 0:00 errlogger -n ashd patplex /usr/local/etc/ashd/rootpat 0 0 2225 2215 20 0 4012 552 ? S ? 0:03 patplex /usr/local/etc/ashd/rootpat 4 0 2495 2225 20 0 129380 680 ? S ? 0:00 sudo -u www-data accesslog /var/log/http/access.log dirplex /srv/www/htdocs 4 33 2496 2495 20 0 3928 412 ? S ? 0:03 accesslog /var/log/http/access.log dirplex /srv/www/htdocs 0 33 2497 2496 20 0 3944 644 ? S ? 57:35 dirplex /srv/www/htdocs 0 33 2518 2497 20 0 266024 17404 ? S ? 2:10 /usr/bin/python /usr/local/bin/ashd-wsgi ashd.wsgidir 0 33 4032 2497 20 0 4140 620 ? S ? 0:00 callfcgi multifscgi 5 php-cgi 0 33 4033 4032 20 0 3900 364 ? S ? 0:00 multifscgi 5 php-cgi 0 33 4034 4033 20 0 247204 2332 ? S ? 0:01 php-cgi 0 33 4035 4033 20 0 247204 2400 ? S ? 0:01 php-cgi 0 33 4036 4033 20 0 248508 568 ? S ? 0:01 php-cgi 0 33 4037 4033 20 0 247204 2340 ? S ? 0:01 php-cgi 0 33 4038 4033 20 0 248240 3084 ? S ? 0:01 php-cgi 0 33 1080 2497 20 0 3932 488 ? S ? 0:00 callcgi GET /gitweb/?p=ashd.git;a=blame;f=src/htparser.c;hb=HEAD 0 33 1081 1080 20 0 143944 11136 ? S ? 0:00 /usr/bin/perl gitweb/index.cgi gitweb/index.cgi 0 33 1088 1081 20 0 9780 1344 ? D ? 0:00 /usr/bin/git --git-dir=/srv/git/r/ashd.git blame -p HEAD -- src/htparser.c 0 0 3297 2225 20 0 12344 584 ? S ? 0:00 userplex -g users -d public_html dirplex -c apache-compat public_html 4 504 3298 3297 20 0 3944 636 ? Ss ? 0:00 dirplex -c apache-compat public_html 4 500 3344 3297 20 0 3928 552 - Ss ? 0:01 accesslog -a /home/fredrik/.ashd/log/access dirplex htpub 0 500 3419 3344 20 0 3944 664 - S ? 0:01 dirplex htpub 0 500 3420 3419 20 0 238960 5252 - Sl ? 2:08 /usr/bin/python3 /usr/local/bin/ashd-wsgi3 -m /home/fredrik/.ashd/sockets/pdm3 ashd.wsgidir 0 500 4044 3419 20 0 119412 1672 - S ? 0:14 psendfile 0 500 2159 3419 20 0 3932 464 - S ? 0:00 htextauth -s ./auth -- dirplex -c ./sub.cf /home/pub 0 500 2160 2159 20 0 3944 524 - S ? 0:03 dirplex -c ./sub.cf /home/pub 0 500 31056 3419 20 0 4140 496 - S ? 0:00 callfcgi php-cgi 0 500 31057 31056 20 0 247456 732 - S ? 0:00 php-cgi 4 506 3586 3297 20 0 3944 664 ? Ss ? 0:03 dirplex -c apache-compat public_html 0 506 3830 3586 20 0 7728 1732 ? S ? 0:00 callfcgi php-cgi 0 506 15184 3830 20 0 247464 5772 ? S ? 0:00 php-cgi 4 505 4045 3297 20 0 3944 600 ? Ss ? 0:00 dirplex -c apache-compat public_html 4 507 6376 3297 20 0 3944 496 ? Ss ? 0:00 dirplex -c apache-compat public_html 4 510 9476 3297 20 0 3944 632 ? Ss ? 0:00 dirplex -c apache-compat public_html 4 1000 12610 3297 20 0 3944 480 ? Ss ? 0:00 dirplex -c apache-compat public_html 4 513 24954 3297 20 0 3944 524 ? Ss ? 0:00 dirplex -c apache-compat public_html 0 513 24955 24954 20 0 4140 520 ? S ? 0:00 callfcgi php-cgi 0 513 24956 24955 20 0 249788 800 ? S ? 0:00 php-cgi 4 515 27761 3297 20 0 3944 472 ? Ss ? 0:00 dirplex -c apache-compat public_html 4 502 18758 3297 20 0 3944 524 ? Ss ? 0:00 dirplex -c apache-compat public_html $

The Cast

The Ashd programs of primary interest are the following:

htparser The "actual" HTTP server. htparser is the program that listens to TCP connections and speaks HTTP with the clients. dirplex dirplex is the program used for serving files from actual directories, in a manner akin to how most other HTTP servers work. In order to do that, dirplex maps URLs into existing physical files, and then performs various kinds of pattern-matching against the names of those physical files to determine the program to call to actually serve them. patplex Performs pattern matching against logical request parameters such as the rest string, URL or various headers to determine a program to pass the request to. As such, patplex can be used to implement such things as virtual directories or virtual hosts. sendfile A simple handler program for sending literal file contents, normally called by dirplex for serving ordinary files. It handles caching using the Last-Modified and related headers. It also handles MIME-type detection if a specific MIME-type was not specified. callcgi Translates an Ashd request into a CGI environment, and runs either the requested file directly as a CGI script, or an external CGI handler. Thus, it can be used to serve, for example, PHP pages. userplex Handles "user directories", to use Apache parlance; you may know them otherwise as /~user/ URLs. When a request is made for the directory of a specific user, it makes sure that the request handler runs as the user in question. This functionality was actually what prompted me to begin writing Ashd as a whole, since I was severely annoyed by the fact that Apache serves user directories as the www-data (or similar) user. Serving a user directory properly as its owner ensures both that all dynamic content can access all the relevant files they may need, that any files they create or modify can be properly owned by the right user and that no other users need access to one's home directory; and that one user cannot violate the "web space" of other users just by running PHP scripts to do that. It also relieves the web server from various weird security considerations which comes from trusting users with running code as another user.

Outside the main cast, there are also the htls , accesslog , htextauth , callscgi , callfcgi , httimed , httrcall , errlogger , psendfile and multifscgi programs.

There is also a Python module, which comes with the ashd-wsgi and scgi-wsgi programs for serving WSGI scripts and an undocumented program for serving files with server-side includes. It also contains rather general (documented) modules for writing custom Ashd handlers very conveniently. There are versions of the Python module and programs for both Python 2 and Python 3. The Python 2 module has been verified to work with Jython.

Documentation

Ashd is primarily documented in the same manual pages that this page links to. For a practical introduction, read the accompanying INSTALL file and/or see the simple configuration examples that are included in the examples directory of the source tree.

Download

The latest release of Ashd is 0.12. Download it here.

The latest release of the Python module is 0.5. Download the Python 2 version here, or the Python 3 version here.

The latest source code is available through Git at <git://git.dolda2000.com/ashd> , also viewable through Gitweb.

Ashd has, at least to my knowledge, not been extensively benchmarked, so its performance characteristics are not well known. It should also be noted that optimization has not been a priority when writing it, with precedence given to brevity and clarity. (Which, on the other hand, means that if optimization should at some point be necessary, there should be much low-hanging fruit to pick.)

The closest thing I have done to benchmarking on Ashd is running it to serve the moderately busy site havenandhearth.com, where most of the traffic consists of static files. There is dynamic content as well, but it receives far less traffic. On this site, Ashd serves on average about 1.5 million requests per day on about 100 simultaneous HTTP connections, with temporary peaks of slightly above 100 requests per second on 1000-1500 simultaneous connections. A good portion (I would estimate it to about 20%) of the traffic happens via HTTPS. Under these circumstances, the programs involved in the most common requests consume CPU time as follows.

Program Average CPU usage htparser 0.53% patplex 0.041% dirplex 0.12% sendfile 1.3% accesslog 0.036% Total 2.0%

The above measurements are calculated from the cumulative CPU time used by the respective programs after having run for several weeks. By comparison, the PHP engine running the site's discussion forum, which receives about 100,000 requests per day, uses 5.4% CPU. The CPU is an Intel Core i7-920.

In this context, it should be noted that the multi-process architecture of Ashd makes it inherently parallel to some degree, despite the individual programs being single-threaded. It is probably to be expected that htparser will be the first bottleneck, particularly because of its single-threaded nature.

This site attempts not to be broken.

Author: Fredrik Tolf <fredrik@dolda2000.com>

Last changed: Thu Feb 13 03:39:28 2014