Poking Around Joe Armstrong’s Simple Web Server December 29, 2008

On December 19 I suggested that poking around source code can be a valuable learning experience, even if you know little about the language. And, as I’ve admitted, I’m still learning this fine language, so I welcome any and all to fill in the gaps or correct the gaffes in the following exploration. Just drop us a comment.

Here we’ll poke around the source of Joe Armstrong’s Simple Webserver:

http://www.sics.se/~joe/tutorials/web_server/web_server.html

Fortunately, in this case, we have Joe to provide guidance along the way.

If you’re well up on Erlang, you may find the remainder of this post rather ho hum. Stick around anyway to keep us honest. f you’re just tuning into Erlang, on the other hand, by all means call up Joe’s source on your browser and poke along with me.

Joe factors the server-side process of serving up content and services into three layers:

Web Server <–> HTTP Driver <–> TCP Driver

We’ll look at the Web Server layer in this post; explication of the other two layers to follow.

The Web Server component of Joe’s Simple Web Server is implemented in:

http://www.sics.se/~joe/tutorials/web_server/web_server.erl

First, we note the -export directive. This tells us that the module web_server.erl makes four functions available to other modules: cover_start/0, cover_stop/0, start/1, stop/1.

-export([cover_start/0, cover_stop/0, start/1, stop/1]).

Why two versions of start and stop?

Insight

If we scan down to the function cover_start(), we note that first thing it calls is cover:start/0, e.g. the start/0 function in the Erlang module cover.

Googling the search term “Erlang + cover” turns up the man page for cover with the summary “A Coverage Analysis Tool for Erlang.”

See it here: http://erlang.org/doc/man/cover.html

Evidently the functions cover_start/0 and cover_stop/0 are included in web_server.erl as testing or diagnostic functions. In fact, Joe says just this in Section 3 at the bottom of:

http://www.sics.se/~joe/tutorials/web_server/web_server.html

So, right off we’ve learned a valuable technique for developing more reliable Erlang code. Let’s tuck cover away for later experimentation and integration into our Erlang bag of tricks.

Imports

Note further that web_server.erl imports three functions from two modules:

-import(http_driver, [classify/1, header/1]).

-import(lists, [map/2]).

http_driver.erl is the second layer of Joe Armstrong’s Simple Web Server, so we’ll explore it later.

Here’s where to find the man page for the lists module:

http://www.erlang.org/doc/apps/stdlib/ref_man_frame.html

Scan down the man page for documentation of map/2. Looks pretty useful. Let’s tuck this too into our Erlang bag of tricks.

start/1

Here’s the code for start/1:

start([A]) ->

start_on_port(list_to_integer(atom_to_list(A))).

We see that start/1 calls the internal function start_on_port/1. The value of A is evidently a port address represented as an atom since it is transformed to a list of one element by the built-in function atom_to_list/1. The list, in turn, is transformed to an integer by the built-in function start_on_port/1. The resulting integer is then passed to start_on_port/1.

Newbie question:

Why must we pass an atom, rather than an integer, as parameter into serve/1 when we can pass an integer directly into start_on_port/1?

If anyone knows the answer, please post.

start_on_port/1

Scanning down the web_server.erl source file, we see that start_on_port/1 spawns server/1 as a new process.

start_on_port(Port) -> spawn_link(fun() -> server(Port) end).

server/1

server(Port) -> S = self(), process_flag(trap_exit, true), http_driver:start(Port, fun(Client) -> server(Client, S) end, 15), loop().

server/1 assigns its own PID to the variable S, sets a process flag, calls start/2 in the module http_driver and calls loop/1.

You see the clause “process_flag(trap_exit, true)” quite often in Erlang code.

Thanks to Matt McDonnell (http://www.matt-mcdonnell.com/code/code_erl/erl_course/erl_course.html), we learn that the process_flag clause “sets the current process to convert exit signals to exit messages that can be received as normal messages.”

You can read more about the significance of this with respect to reliability in Matt’s nifty Erlang course. Check it out.

For more on error handling in general, see Chapter 9 in Joe Armstrong’s book, Programming Erlang.

Moving on to the clause “http_driver:start(Port, fun(Client) -> server(Client, S) end, 15).”

As you’ll recall, the module http_driver.erl is the second component in Joe Armstong’s Simple Web Server. We’ll defer poking into it until we have a better grasp on web_server.erl. As we pointed out in an earlier post http_driver.erl translates bits coming through the network socket into Erlang messages. And more it translates Erlang messages coming from web_server.erl into a bit stream suitable for transmission through the network socket.

It’s essential to note, that the call to http_driver:start/2 in server/1 takes web_server:server/2 as a client function. As we’ll note in a moment, server/2 is the functional heart of web_server.erl.

There are subtleties here. As we’ll see when we look at http_driver.erl and tcp_driver, web_server:server/2 gets passed down and started up somewhere, I think, in tcp_driver.erl. I’m guessing that a PID gets passed back as message to web_server:loop/0.

(Someone, please, post clarification to get me out of deep water here.)

Scanning down to loop/0 we see an endless loop (note the tail recursion just before the end clause) that prints out a message when it receives a message in Any.

loop() -> receive Any -> io:format("server:~p~n",[Any]), loop() end.

Poking into web_server:server/2, we see the classic web server loop described by Joe in http://www.sics.se/~joe/tutorials/web_server/web_server.html. If the client sends the tuple ‘closed’ as a message, or 5000 ticks of the clock pass by, the loop returns true, my guess is to close down the process.

server(Client, Master) -> receive {Client, closed} -> true; {Client, Request} -> Response = generate_response(Request), Client ! {self(), Response}, server(Client, Master) after 5000 -> true end.

If the client sends a request in the variable Request, server/2 generate_response prepares a response and sends it back to the client. Again, note the tail recursive call just above the line “after 5000 ->.”

Here’s generate_response/1:

generate_response({_, Vsn, F, Args, Env}) -> F1 = "." ++ F, case file:read_file(F1) of {ok, Bin} -> case classify(F) of html -> {header(html),[Bin]}; jpg -> {header(jpg),[Bin]}; gif -> {header(jpg),[Bin]}; _ -> {header(text),[body("white"),"<pre>",Bin,"</pre>"]} end; _ -> show({no_such_file,F,args,Args,cwd,file:get_cwd()}) end.

The request is in the form of a tuple with five elements. (With all due respect, Joe, speaking from the newbie gallery, we could really use some comments here to understand what these elements are.) Best I can tell F is a filename.

The case statement in generate_response/1 checks to see if the file F1 exists. If so, it runs another case statement to determine whether we’re looking at html, jpg, or gif file. Depending upon the result, it returns a tuple that passes a function call to http_driver:header/1 and the contents of the file represented by F1 (Bin). web_server:server/2 then passes this tuple to the client as a message.

if http_driver:classify/1 tells us that our content file is anything other than html, jpg, or gif, it tells http_driver:header/1 to render it as text.

So, bottom line, if Request sends a file name, web_server.erl looks up the file and returns type and content as a Response to http_driver.

If the file doesn’t exist, web_server:generate_response/1 calls show/1, which sends off an error message. Note the file function file:get_cwd().

I’ll leave it to you to parse out show/1.