Motivation

This post describes how to program scalable web applications with Erlang and Python using computational parallelism. Caching and load balancing are well documented elsewhere and beyond the scope of this post.

Web applications, by nature, span two drastically different programming domains: The high-level web design and development domain and the low-level, high-performance, and distributed domain. Since the internet has provided a bridge between these two domains, it’s now possible to realistically provide high-level user interfaces to high-performance back-end applications.

I prefer to do as much programming in Python and the Django web framework as possible because it’s just so easy. ErlyWeb is an Erlang web framework which should provide similar features to Django, though I’ve never used it. In a web application setting, it’s easy to have too many Apache/Python processes which max out server memory or the CPU. For those tasks, I like Erlang. Erlang has parallelism and distribution primitives, which are arguably more elegant than Python’s concurrency primitives, and has SMP support.

Erlang + Python Communication

It seems like a no-brainier to use domain specific languages to ease development efforts in the corresponding domain but the tricky part is making multiple languages communicate when multiple domains are spanned. MochiMedia is a small company that builds many of their products around Erlang + Python and, after shopping around, I chose to use their method of interfacing between the two languages: HTTP + JSON. The data being sent between the two languages is serialized, sometimes with JSON, and then passed along via HTTP. This has a few benefits. First, since HTTP is being used, my Erlang cluster can be across the internet from my Python front-end server. Second, since I’m using an independent intermediate representation to serialize the data (JSON), any component of the application stack may be swapped out for something completely new.

Here’s how it’s done:

Use MochiWeb to enable the Erlang nodes to communicate over HTTP. The latest version of Erlang and OpenSSL headers are needed to compile MochiWeb. Create a MochiWeb project skeleton. Since MochiWeb is a framework, a script is provided to help create a web server which uses MochiWeb. Modify the request handler to understand JSON. The only Erlang that really needs to be modified is in [project name]/src/[project name]_web.erl . This is where the processing code goes (ex: map/reduce).

On the Python side, a simple urllib2.urlopen can be used to build Request objects to send to Erlang. Django comes pre-packaged with SimpleJSON to serialize the body of the HTTP request:

def send_to_erlang(data): url = "http://erlang.nodes.tld:8000/" body = json.dumps(data) headers = {'Content-Type': 'application/jsonrequest', 'User-Agent' : 'Python/Project/0.1'} urlopen(Request(url, body, headers))

Parallelism

Kevin Smith, of Hypothetical Labs, did a great interview with Bob Ippolito, CTO at MochiMedia, which is a great case study for Erlang + Python. Bob talks in-depth about the engineering tasks the model helps overcome.

Computation tasks which can be executed in parallel are key to utilizing Erlang’s distributed parallelism. A relatively small message with big computations is the desired abstraction. For example, the Django web interface could wrap an Erlang distributed map/reduce implementation. The Erlang book enumerates many different paradigms for Erlang distributed parallelism and for programmers who already have an idea, the plists library takes care of all the distribution automatically. A programmer with at least a little experience in both Erlang and Python should be able to hack their way through to a fully functional and scalable web application from here.

None Found