When Canonical released Ubuntu 9.10 in October, the Linux distributor also officially launched Ubuntu One, a cloud storage solution that is designed to synchronize files and application data between multiple computers over the Internet. The service has considerable potential, but only a handful of applications—including Evolution and Tomboy—take advantage of its capabilities.

Fortunately, the underlying components that Canonical has adopted for Ubuntu One make it surprisingly easy for third-party software developers to integrate support for cloud synchronization in their own applications. In this article, we will show you how to do it and give you some sample code so that you can get started right away.

Ubuntu One architecture

There are a few aspects of Ubuntu One's architecture that you should understand before we begin. The service's file and application synchronization features are largely separate and operate on different principles. In this article, we will be looking solely at the framework for synchronizing application data. This facet of Ubuntu One is powered by CouchDB, an open source database system.

One of the most noteworthy advantages of CouchDB is that it is highly conducive to replication. It has built-in support for propagating data between CouchDB instances that are running on different servers. Ubuntu One is engineered to take advantage of this characteristic of CouchDB. When a user runs the Ubuntu One client application that is shipped with Ubuntu 9.10, it will attempt to establish a pairing with Canonical's servers in the cloud.

This pairing enables replication between the local instance of CouchDB that is running on the user's own computer and a remote instance of CouchDB that is hosted on Canonical's infrastructure. When the user has the Ubuntu One client enabled, data that applications put in CouchDB will automatically be propagated to and from other computers that the user has authorized to access their Ubuntu One account.

As we explained at length in our review of Ubuntu 9.10, one of the challenges posed by adopting CouchDB on the desktop is that it isn't really intended to be used with a variable number of instances in multiuser environments. To work around this limitation, Canonical has created a simple framework of scripts called Desktop CouchDB that make it possible to dynamically spawn per-session instances of CouchDB on randomly selected ports.

Desktop CouchDB uses D-Bus activation to automatically launch the database server when it is needed. It will also use D-Bus to expose the port number so that applications can connect without having to know the port ahead of time. In order to improve security and prevent other users from accessing the data, Desktop CouchDB requires that applications supply OAuth credentials before accessing the database.

It's worth noting that users who don't want to rely on Ubuntu One can still use Desktop CouchDB and take advantage of CouchDB's native replication capabilities to achieve seamless cloud synchronization with their own self-hosted infrastructure. Ubuntu One provides free hosted storage and largely automates the configuration, but it's not entirely necessary. The code examples in this article are intended to work with Desktop CouchDB regardless of whether you have an Ubuntu One account.

Accessing Desktop CouchDB with Python

The Ubuntu One developers have created a simple Python library that wraps the Desktop CouchDB service. It transparently handles authentication and completely hides the other idiosyncrasies of Desktop Couch. A GObject-based library is also available for C programmers who want to use the service. The examples in this tutorial will primarily focus on the Python library.

CouchDB is very different from conventional relational databases. It is designed to store its content as JSON documents with nested key/value pairs. To retrieve data from CouchDB, you create special view documents with JavaScript functions that operate on the JSON content. Although this query model seems very alien to developers who are accustomed to working with SQL, it has its own unique beauty that becomes evident over time. It's very flexible because you aren't constrained by a schema and can structure the individual items any way that you want.

The Python library allows you to use dictionary objects to describe your database items. To create a new CouchDB document, you instantiate a new Record and provide it with the data. You should also specify the record type by providing a URL that points to human-readable documentation of the record's structure.

from desktopcouch.records.server import CouchDatabase from desktopcouch.records.record import Record as CouchRecord # Connect to CouchDB and create the database database = CouchDatabase("people", create=True) # Create a new record with some data record = CouchRecord({ "email": "segphault@arstechnica.com", "nickname": "segphault", "name": "Ryan Paul" }, "http://somewhere/couchdb/person") # Put the record into the database database.put_record(record)

In the example above, we accessed a database called "people" by instantiating the CouchDatabase class. The create parameter tells CouchDB to automatically create a new database with that name if one doesn't already exist. We instantiated Record with two arguments. The first one is a dictionary with the data that we want to store in the record. The second one is the record type URL. Finally, we pushed the record into the database by calling the put_record method.

After you run this code, you can see the newly added data in CouchDB by using Futon, a nifty CouchDB debugging tool that lets you inspect and manage databases. You can access Futon by opening ~/.local/share/desktop-couch/couchdb.html in your Web browser. If you have Ubuntu One enabled, the data will automatically appear on other connected computers during the next replication cycle (Ubuntu One data replication occurs every ten minutes).

The concept of record type URLs might seem a bit confusing and warrants further clarification. One of the goals of the Desktop CouchDB project is to encourage interoperability between applications. The record types, which are not a standard part of CouchDB, are a convention that was introduced by Canonical's developers to make it easier for multiple applications to share the same data with each other in CouchDB.

The URL is supposed to point to a wiki page that describes the fields that are used with the associated record type. Ideally, these fields should not be implementation-specific. Information that is intended to be used only by a single application should be stored in a subdocument for application annotations.

It's important to understand that record type documentation is not the same thing as a schema. The goal is to provide guidance that will help other developers make their software work with the data. Conformance with the documented structure is not enforced in any way and you are not obligated to use a valid URL. Additionally, it's important to keep in mind that CouchDB data is supposed to be amorphous and you don't necessarily need to have the same fields in every record within a database.