Unhosted web applications: a new approach to freeing SaaS

This article brought to you by LWN subscribers Subscribers to LWN.net made this article — and everything that surrounds it — possible. If you appreciate our content, please buy a subscription and make the next set of articles possible.

Free software advocates have been pushing hard against the growing trend of commercial Software-as-a-Service (SaaS) — and the resulting loss of autonomy and software freedom — for several years now. A new project named Unhosted takes a different approach to the issue than that used by better-known examples like Diaspora and StatusNet. Unhosted is building a framework in which all of a web application's code is run on the client-side, and users have the freedom to choose any remote data storage location they like. The storage nodes use strong encryption, and because they are decoupled from the application provider, users always have the freedom to switch between them or to shut off their accounts entirely.

The Unhosted approach

An outline of the service model envisioned by Unhosted can be found on the project's Manifesto page, written by founder Michiel de Jong. "A hosted website provides two things: processing and storage. An unhosted website only hosts its source code (or even just a bootloader for it). Processing is done in the browser, with ajax against encrypted cloud storage."

In other words, the manifesto continues, despite the availability of the Affero GPL (AGPL), which requires making source code available to network end-users, licensing alone is not enough to preserve user freedom because proprietary SaaS sites require users to upload their data to "walled silos" run by the service provider. An Unhosted application is a JavaScript program that runs in the browser, but accesses online storage on a compliant storage node. It does not matter to the application whether the storage node is run by the application provider, the user, or a third party.

Storage nodes are essentially commodity infrastructure, but in order to preserve user freedom, Unhosted requires that applications encrypt and sign the data they store. The project defines an application-layer protocol called Unhosted JSON Juggling Protocol (UJJP, sometimes referred to as UJ) for applications to communicate with storage nodes, for requesting and exchanging objects in JavaScript Object Notation (JSON) format.

As the FAQ explains, this constitutes a distinctly different model than most other free software SaaS projects. Most (like StatusNet and Diaspora) focus on federation, which allows each user to run his or her own node, and requires no centralized authority linking all of the user accounts. The down side of the federated systems are that they may still require the users to entrust their data to a remote server.

Eben Moglen's FreedomBox, on the other hand, focuses on putting the storage under the direct control of the user (specifically, stored at home on a self-managed box). This is a greater degree of freedom, but home-hosting is less accessible from the Internet at large than most web services because it often depends on Dynamic DNS. Home-hosting is also vulnerable to limited upstream bandwidth and common ISP restrictions on running servers.

Unhosted, therefore, attempts to preserve the "accessible anywhere" nicety of popular Web 2.0 services, but de-link the application from the siloed data.

Connecting applications to storage

Obviously, writing front-end applications entirely in HTML5 and JavaScript is not a new idea. The secret sauce of Unhosted is the connection method that links the application to the remote storage node — or, more precisely, that links the application to any user-defined storage node. The system relies on Cross-Origin Resource Sharing (CORS), a W3C Working Draft mechanism by which a server can opt-in to make its resources available to requests originating from other servers.

In the canonical "web mail" example, the Unhosted storage node sees a cross-origin request from the webmail application, checks the source, user credentials, and request type against its access control list, and returns the requested data only if the request is deemed valid. UJJP defines the operations an application can perform on the storage node, including creating a new data store, setting and retrieving key-value pairs, importing and exporting data sets, and completely deleting a data store.

Security-wise, each application only has access to its own data store, not the user's entire storage space, and CORS does allow each storage node to determine a policy about which origins it will respond to. But beyond that, the system also relies on the fact that the user has access to all of the application source code, because it runs in the browser. Thus it is up to the user to notice if the application does something sinister like relay user credentials to an untrusted third party. Dealing with potentially obfuscated JavaScript may be problematic for users, but it is still an improvement over server-side processing, which happens entirely out of sight.

Finally, each application needs a way to discover which storage node a user account is associated with, preferably without prompting the user for the information every time. The current Unhosted project demo code relies on Webfinger-based service discovery, which uniquely associates a user account with an email address. The user would log in to the application with an email address, the application would query the address's Webfinger identity to retrieve a JSON-formatted array of Unhosted resource identifiers, and connect to the appropriate one to find the account's data store.

This is not a perfect solution, however, because it depends on the email service provider supporting Webfinger. Other proposed mechanisms exist, including using Jabber IDs and Freedentity.

The tricky bits

Currently, one of the biggest sticking points in the system is protecting the stored data without making the system arduous for end users. The present model relies on RSA encryption and signing for all data stores. Although the project claims this is virtually transparent for users, it gets more difficult when one Unhosted application user wishes to send a message to another user. Because the other user is on a different storage node, that user's public key needs to be retrieved in order to encrypt the message. But the system cannot blindly trust any remote storage node to authoritatively verify the other user's identity — that would be trivial to hijack. In response, the Unhosted developers are working on a "fabric-based public key infrastructure" that enables users to deterministically traverse through a web-of-trust from one user ID to another. Details on that part of the system are still forthcoming.

It is also an open question as to what sort of storage engine makes a suitable base for an Unhosted storage node. The demo code includes servers written in PHP, Perl, and Python that all run on top of standard HTTP web servers. On the mailing list, others have discussed a simple way to implement Unhosted storage on top of WebDAV, but there is no reason that a storage node could not be implemented on top of a distributed filesystem like Tahoe, or a decentralized network like Bittorrent.

Perhaps the most fundamental obstacle facing Unhosted is that it eschews server-side processing altogether. Consequently, no processing can take place while the user is logged out of the application. Logged out could simply mean that the page or tab is closed, or an application could provide a logout mechanism that disconnects from the storage node, but continues to perform other functions. This is fine for interactive or message-based applications like instant messaging, but it limits the type of application that can be fit into the Unhosted mold. Judging by the mailing list, the project members have been exploring queuing up operations on the storage node side, which could enable more asynchronous functionality, but Unhosted is still not a replacement for every type of SaaS.

Actual code and holiday bake-offs

The project has a Github repository, which is home to some demonstration code showing off both parts of the Unhosted platform — although it loudly warns users that it is not meant for production use. The "cloudside" directory includes an example Unhosted storage node implementation, while the "wappside" directory includes three example applications designed to communicate with the storage node.

The storage node module speaks CORS and is written in PHP with a MySQL back-end. It does not contain any server-side user authentication, so it should not be deployed outside the local area network, but it works as a sample back-end for the example applications.

The example application set includes a JavaScript library named unhosted.js that incorporates RSA data signing and signature verification, encryption and decryption, and AJAX communication with the CORS storage node. There is a separate RSA key generation Web utility provided as a convenience, but it is not integrated into the example applications.

The example named "wappblog" is a simple blog-updating application. It creates a client-side post editor that updates the contents of an HTML file on a storage node, which is then retrieved for reading by a separate page. The "wappmail" application is a simple web mail application, which requires you to set up multiple user accounts, but shows off the ability to queue operations — incoming messages are stored and processed when each user logs in.

The third example is an address book, which demonstrates the fabric-based PKI system (although the documentation warns "it's so new that even I don't really understand how it works, and it's mainly there for people who are interested in the geeky details").

A more practical set of example applications are the third-party projects written for Unhosted's "Hacky Holidays" competition in December. The winning entry was Nathan Rugg's Scrapbook, which allows users to manipulate text and images on an HTML canvas, and shows how an Unhosted storage node can be used to store more than just plain text. Second place was shared between the instant messenger NezAIM and the note-taking application Notes.

The fourth entry, vCards, was deemed an honorable mention, although it used some client-side security techniques that would not work in a distributed environment in the real world (such as creating access control lists on the client side). The author of vCards was commended by the team for pushing the envelope of the protocol, though — he was one of the first to experiment with queuing operations so that one Unhosted application could pass messages to another.

Hackers wanted

At this stage, Unhosted is still primarily a proof-of-concept. The storage node code is very young, and has not been subjected to much real-world stress testing or security review. The developers are seeking input for the next (0.3) revision of UJJP, in which they hope to define better access control mechanisms for storage nodes (in part to enable inter-application communication) as well as a REST API.

On a bad day, I see "unresponsive script" warnings in Firefox and think rich client-side JavaScript applications sound like a terrible idea, but perhaps that is missing the bigger picture. StatusNet, Diaspora, and the other federated web services all do a good job of freeing users from reliance on one proprietary application vendor — but none of them are designed to make the storage underneath a flexible, replaceable commodity. One of the Unhosted project's liveliest metaphors for its storage de-coupling design is that it provides "a grease layer" between the hosted software and the servers that host it. That is an original idea, whether the top layer is written in JavaScript, or not.