Blërg

There's no stable release yet, but you can get everything currently running on blerg.dominionofawesome.com by cloning the git repository at http://git.bytex64.net/blerg.git.

Blërg has varying requirements depending on how you want to run it — as a standalone HTTP server, or as a CGI. You will need:

yajl >= 1.0.0 and < 2 (yajl is a JSON parser/generator written in C which, by some twisted sense of humor, requires ruby to compile)

As a standalone HTTP, server, you will also need:

Or, as a CGI, you will need:

Edit libs.mk and put in the paths where you can find headers and libraries for the above requirements.

Also, further apologies to BSD folks — I've probably committed several unconscious Linux-isms. It would not surprise me if the makefile refuses to work with BSD make, or if it fails to compile even with gmake. If you have patches or suggestions on how to make Blërg more portable, I'd be happy to hear them.

At this point, it should be gravy. Type 'make' and in a few seconds, you should have blerg.cgi , blergtool , and blerglatest .

NOTE: blerg.httpd is deprecated and will not be updated with new features.

While it's not strictly required, Blërg will be easier to set up if you configure it to work from the root of your website. For this reason, it's better to use a subdomain (i.e., blerg.yoursite.com is easier than yoursite.com/blerg/). If you do want to put it in a subdirectory, you will have to modify www/js/blerg.js and change baseURL at the top as well as a number of other self-references in that file and www/index.html .

You cannot serve the database and client from different domains (i.e., yoursite.com vs othersite.net, or even foo.yoursite.com and bar.yoursite.com). This is a requirement of the web browser — the same origin policy will not allow an AJAX request to travel across domains (though you can probably get around it these days with Cross-origin resource sharing).

For straight CGI with Apache

Copy the files in www/ to the root of your web server. Copy blerg.cgi to your web server. Included in www-configs/ is a .htaccess file for Apache that will rewrite the URLs. If you need to call the CGI something other than blerg.cgi , the .htaccess file will need to be modified.

For nginx

Nginx can't run CGI directly, and there's currently no FastCGI version of Blërg, so you will have to run it under some kind of CGI to FastCGI gateway, like the one described here on the nginx wiki. This pretty much destroys the performance of Blërg, but it's all we've got right now.

The extra RSS CGI

There is an optional RSS cgi ( aux/cgi/rss.cgi ) that will serve RSS feeds for users. Install this like blerg.cgi above. As of 1.9.0, this is a perl FastCGI script, so you will have to make sure the perl libraries are available to it. A good way of doing that is to install to an environment directory, as described below.

Installing to an environment directory

The Makefile has support for installing Blërg into a directory that includes tools, libraries, and configuration snippets for shell and web servers. Use it as make install-environment ENV_DIR=<directory> . Under <directory>/etc will be a shell script that sets environment variables, and configuration snippets for nginx and apache to do the same. This should make it somewhat easier to use Blërg in a self-contained way.

For example, this will install Blërg to an environment directory inside your home directory:

user@devhost:~/blerg$ make install-environment ENV_DIR=$HOME/blerg-env ... user@devhost:~/blerg$ . ~/blerg-env/etc/env.sh

Then, you will be able to run tools like blergtool , and it will operate on data inside ~/blerg-env/data . Likewise, you can include /home/user/blerg-env/etc/nginx-fastcgi-vars.conf or /home/user/blerg-env/etc/apache-setenv.conf in your webserver to make the CGI/FastCGI scripts to the same thing.

Blërg's API was designed to be as simple as possible. Data sent from the client is POSTed with the application/x-www-form-urlencoded encoding, and a successful response is always JSON. The API endpoints will be described as though the server were serving requests from the root of the wesite.

On failure, all API calls return either a standard HTTP error response, like 404 Not Found if a record or user doesn't exist, or a 200 response with a 'JSON failure', which will look like this:

{"status": "failure"}

Blërg doesn't currently explain why there is a failure, and I'm not sure it ever will.

On success, you'll either get some JSON relating to your request (for /get, /tag, or /info), or a 'JSON success' response (for /create, /put, /login, or /logout), which looks like this:

{"status": "success"}

For the CGI backend, you may get a 500 error if something goes wrong. For the HTTP backend, you'll get nothing (since it will have crashed), or maybe a 502 Bad Gateway if you have it behind another web server.

All usernames must be 32 characters or less. Usernames must contain only the ASCII characters 0-9, A-Z, a-z, underscore (_), and hyphen (-). Passwords can be at most 64 bytes, and have no limits on characters (but beware: if you have a null in the middle, it will stop checking there because I use strncmp(3) to compare).

Tags must be 64 characters or less, and can contain only the ASCII characters 0-9, A-Z, a-z, underscore (_), and hyphen (-).

As the result of a successful login, the server will send back a cookie named auth . This cookie authorizes restricted requests, and must be sent for any API endpoint marked authorization, or else you will get a 403 Forbidden response. The cookie format looks like: auth=username/abcdef0123456789abcdef0123456789 That is a username, a forward slash, and 32 hexadecimal digits which denote the "token" identifying the session. On logout, the server will invalidate the token and expire the cookie.

/create - create a new user

To create a user, POST to /create with username and password parameters for the new user. The server will respond with JSON failure if the user exists, or if the user can't be created for some other reason. The server will respond with JSON success if the user is created.

/login - log in

POST to /login with the username and password parameters for an existing user. The server will respond with JSON failure if the user does not exist or if the password is incorrect. On success, the server will respond with JSON success, and will set a cookie named 'auth' that must be sent by the client when accessing restricted API functions (See Authorization above).

/logout - log out

authorization

POST to /logout. The server will respond with JSON failure if the user does not exist or if the request is unauthorized. The server will respond with JSON success after the user is successfully logged out.

/put - add a new record

authorization

POST to /put with a data parameter. The server will respond with JSON failure if the request is unauthorized, if the user doesn't exist, or if data contains more than 65535 bytes after URL decoding. The server will respond with JSON success after the record is successfully added.

/get/(user), /get/(user)/(start record)-(end record) - get records for a user

A GET request to /get/(user), where (user) is the user desired, will return the last 50 records for that user in a list of objects. The record objects look like this:

{ "record":"0", "timestamp":1294309438, "data":"eatin a taco on fifth street" }

record is the record number, timestamp is the UNIX epoch timestamp (i.e., the number of seconds since Jan 1 1970 00:00:00 GMT), and data is the content of the record. The record number is sent as a string because while Blërg supports record numbers up to 264 - 1, Javascript uses floating point for all its numbers, and can only support integers without truncation up to 253. This difference is largely academic, but I didn't want this problem to sneak up on anyone who is more insane than I am. :]

The second form, /get/(user)/(start record)-(end record), retrieves a specific range of records, from (start record) to (end record) inclusive. You can retrieve at most 100 records this way. If (end record) - (start record) specifies more than 100 records, or if the range specifies invalid records, or if the end record is before the start record, the server will respond with JSON failure.

/info/(user) - Get information about a user

A GET request to /info/(user) will return a JSON object with information about the user (currently only the number of records). The info object looks like this:

{ "record_count": "544" }

Again, the record count is sent as a string for 64-bit safety.

/tag/(#|H|@)(tagname) - Retrieve records containing tags

A GET request to this endpoint will return the last 50 records associated with the given tag. The first character is either # or H for hashtags, or @ for mentions (I call them ref tags). You should URL encode the # or @, lest some servers complain at you. The H alias for # was created because Apache helpfully strips the fragment of a URL (everything from the # to the end) before handing it off to the CGI, even if the hash is URL encoded. The record objects also contain an extra author field, like so:

{ "author":"Jon", "record":"57", "timestamp":1294555793, "data":"I'm taking #garfield to the vet." }

There is currently no support for getting more than 50 tags, but /tag will probably mutate to work like /get.

- Subscribe to a user's updates

authorization

POST to /subscribe/(user) with a subscribed parameter that is either "true" or "false", indicating whether (user) should be subscribed to or not. The server will respond with JSON failure if the request is unauthorized or if the user doesn't exist. The server will respond with JSON success after the subscription request is successfully registered.

/feed - Get updates for subscribed users

authorization

POST to /feed, with a username parameter and an auth cookie. The server will respond with a JSON list of the last 50 updates from all subscribed users, in reverse chronological order. Fetching /feed does not reset the new message count returned from /status. To do that, look at POST /status.

NOTE: subscription notifications are only stored while subscriptions are active. Any records inserted before or after a subscription is active will not show up in /feed.

/status, /status/(user) - Get or clear general and user-specific status

authorization

GET to /status to get information about your account. It tells you the number of new subscription records since the last time the subscription counter was reset, and a flag for whether the account was mentioned since the last time the mention flag was cleared. The server will respond with a JSON object:

{ "feed_new": 3, "mentioned": false }

POST to /status with a clear parameter that is either "feed" or "mentioned" to reset either the subscription counter or the mention flag, respectively. There is not currently a way to clear both with a single request. The server will respond with JSON success.

GET to /status/(user) to get subscription information for a particular user. The server will respond with a simple JSON object:

{"subscribed":true}

The value of "subscribed" will be either true or false depending on the subscription status.

/passwd - Change a user's password

authorization

POST to /passwd with password and new_password parameters to change the user's password. For extra protection, changing a password requires sending the user's current password in the password parameter. If authentication is successful and the password matches, the user's password is set to new_password and the server responds with JSON success. If the password doesn't match, or one of password or new_password are missing, the server returns JSON failure.

Most of Blërg's core functionality is packaged in a static library called blerg.a . It's not designed to be public or installed with `make install-environment`, but it should be relatively straightforward to use it in C programs. Look at the headers under the database directory.

A secondary library called blerg_auth.a handles the authentication layer of Blërg. To use it, look at common/auth.h .

As of 1.9.0, Blërg includes a perl library called Blerg::Database . It wraps the core and authentication functionality in a perlish interface. The module has its own POD documentation, which you can read with your favorite POD reader, from the manual installed in an environment directory, or in HTML here.

Blërg was created as the result of a thought experiment: "What if Twitter didn't need thousands of servers? What if its millions of users could be handled by a single highly efficient server?" This is probably an unreachable goal due to the sheer amount of I/O, but we can certainly try to do better. Blërg was thus designed as a system with very simple requirements:

Store and fetch small chunks of text efficiently Create fast indexes for hash tags and @ mentions Provide a HTTP interface web apps can use

And to further simplify, I didn't bother handling deletes, full text search, or more complicated tag searches. Blërg only does the basics.

Classical model Client App

HTML/Javascript Webserver

Apache, lighttpd, nginx, etc. Server App

Python, Perl, Ruby, etc. Database

MySQL, PostgreSQL, MongoDB, CouchDB, etc.

Modern web applications have at least a four-layer approach. You have the client-side browser app, the web server, the server-side application, and the database. Your data goes through a lot of layers before it actually resides on disk somewhere (or, as they're calling it these days, "The Cloud" *waves hands*). Each of those layers requires some amount of computing resources, so to increase throughput, we must make the layers more efficient, or reduce the number of layers.

Blërg model Blërg Client App

HTML/Javascript Blërg Database

Fuckin' hardcore C and shit

Blërg does both by smashing the last two or three layers into one application. Blërg can be run as either a standalone web server (currently deprecated because maintaining two versions is hard), or as a CGI (FastCGI support is planned, but I just don't care right now). Less waste, more throughput. As a consequence of this, the entirety of the application logic that the user sees is implemented in the client app in Javascript. That's why all the URLs have #'s — the page is loaded once and switched on the fly to show different views, further reducing load on the server. Even parsing hash tags and URLs are done in client JS.

The API is simple and pragmatic. It's not entirely RESTful, but is rather designed to work well with web-based front-ends. Client data is always POSTed with the usual application/x-www-form-urlencoded encoding, and server data is always returned in JSON format.

The HTTP interface to the database idea has already been done by CouchDB, though I didn't know that until after I wrote Blërg. :)

I was impressed by varnish's design, so I decided early in the design process that I'd try out mmaped I/O. Each user in Blërg has their own database, which consists of a metdata file, and one or more data and index files. The data and index files are memory mapped, which hopefully makes things more efficient by letting the OS handle when to read from disk (or maybe not — I haven't benchmarked it). The index files are preallocated because I believe it's more efficient than writing to it 40 bytes at a time as records are added. The database's limits are reasonable: