Java development 2.0

REST up with CouchDB and Groovy's RESTClient

RESTful concepts and a document-oriented database in action

Content series: This content is part # of # in the series: Java development 2.0 Stay tuned for additional content in this series. This content is part of the series: Java development 2.0 Stay tuned for additional content in this series.

So far, this column series has explored cloud computing with both Google's and Amazon's platforms. Though they differ in implementation and structure, both platforms enable rapid and scalable deployment. They make it possible, as never before, to assemble, test, run, and maintain Java applications quickly and inexpensively. But the cloud isn't the only factor affecting the speed of Java development today. Open source solutions also enable you to assemble software applications quickly, because you don't need to write as much code anymore. Gone are the days of writing your own object-relational mapping (ORM), logging, or testing framework. Those problems have been solved time and again in the open source world and — face it — those solutions are almost always better than yours.

About this series The Java development landscape has changed radically since Java technology first emerged. Thanks to mature open source frameworks and reliable for-rent deployment infrastructures, it's now possible to assemble, test, run, and maintain Java applications quickly and inexpensively. In this series, Andrew Glover explores the spectrum of technologies and tools that make this new Java development paradigm possible.

Across the entire spectrum of Java development, open source innovation is making applications easier to assemble. Apache CouchDB, a new (at release 0.10.0 as of this writing) open source database, is no exception. It's easy to get going with CouchDB once you have it up and running. And all you need to work with it is an HTTP connection; no JDBC driver is required, nor do you need a third-party administrative-management platform. In this article, I'll introduce you to CouchDB and show you how you can rapidly get up to speed with using it. For ease of installation, you'll take advantage of Amazon's EC2 platform. And you'll communicate with CouchDB via a handy Groovy module.

A document-oriented database

Relational databases essentially rule the database market. But other types of databases — including object-orienteed and document-oriented databases, which both differ vastly from the relational-oriented world — make a lot of sense from time to time. CouchDB is a document-oriented database. It's schema-less and allows you to store documents in the form of a JavaScript Object Notation (JSON) string.

JSON JSON is a lightweight data-interchange format and an alternate format for Web applications. It is similar to XML but much less verbose. It is becoming the lingua franeca of the Web, thanks to its lightweight nature. See Related topics to learn more about JSON.

Think of a parking ticket. This piece of paper contains a number of items including:

The date of the infraction

The time

The location

Your vehicle's description

Your license-plate information

The offense

The format and the data gathered on a ticket can vary from jurisdiction to jurisdiction. Even with a standard parking-ticket form within a single jurisdiction, what is or isn't captured on the ticket most likely varies. For example, the officer issuing the citation might not fill in the time, or might omit the vehicle make and model, opting to enter just the license-plate details. The location might be the combination of two streets (such as the intersection of Fourth and Lexington) or just a fixed address (such as 19993 Main Street). But the rough semantics of what's gathered are similar.

A ticket's data points can be modeled in a relational database, but the details get a bit hairy. For example, how do you capture an intersection effectively in a relational database? And in cases where no cross street exists, does the database then have a blank field for the second address (assuming you modeled it in such as way as to capture distinct street names in individual columns)?

In these cases, the abstractness of a relational database might be a bit much. The information required is already there in the form of a document (the ticket). Why not just model the data as a document, which doesn't necessarily fit into a rigid relational model but roughly follows the semantics of a high-level model? That's where CouchDB comes into play. It allows you to model these types of domains in a flexible manner — as a self-contained document that contains no schema but, instead, a roughly similar blueprint to other documents.

MapReduce MapReduce, which was pioneered by Google, is a conceptual framework for processing huge data sets (see Related topics). It is highly optimized for distributable problem solving using a large number of computers. MapReduce is the combination of two functions: map and reduce. The map function is designed to take a large input and divide into smaller pieces (and hand that data off to other processes that can do something with the data). The reduce function is intended to bring all the individual answers from map into one final output.

With CouchDB, you can search for documents, properties of documents, and even relate documents just as in the relational world. You do this using views, not SQL. Views are essentially functions that you write (in JavaScript) in the style of MapReduce; that is, you end up writing a map function and a reduce function. These functions work together to filter or extract data from your documents or exploit relationships among them quite efficiently. In fact, CouchDB is smart enough to run these functions only once, provided the underlying documents don't change, which makes views quite fast.

What's particularly interesting about CouchDB is its design. CouchDB embodies the basic (and highly successful) concepts of the Web itself. It exposes a completely RESTful API that permits the creation, querying, updating, and removal of documents, views, and databases. This makes CouchDB quite easy to pick up and work with. You don't need drivers or other platforms to jump-start development: a browser is essentially it. That being said, a host of libraries are available that make working with CouchDB even easier — but under the covers, they simply exploit RESTful concepts via HTTP.

CouchDB, much like the Web itself, was built to be scalable. It's written in Erlang, a concurrent programming language that supports building distributed, fault-tolerant, nonstop applications (see Related topics). The language (now available as open source) was developed by Ericsson and has been widely leveraged in telecommunications environments.

Installing CouchDB, cloud style

Installation of CouchDB varies depending on your operating system. If you are on Windows®, you need to install Cygwin, the Microsoft C compiler, and a slew of other related dependencies. If you are on a Mac, you need to use Macports. If, however, you are running on a Linux® platform, such as Ubuntu, the installation couldn't be easier. But not everyone has an Ubuntu instance handy. Or do you?

Of course you have an Ubuntu instance handy! Amazon's EC2 is a relatively inexpensive way to use Ubuntu on demand. Thus, with a bit of EC2 magic, you'll have CouchDB up and running in no time; when you're done, you can power it down, so to speak.

First, you'll need to find an EC2 AMI that'll work as a base instance. I ended up using AMI ami-ccf615a5, an instance of Ubuntu 9.04, which was the latest version available at the time of this writing. (By the time you read this, 9.10 will be available along with, most likely, a newer AMI.) Using either Eclipse or the AWS Management Console, launch an instance of ami-ccf615a5. Be sure to set a security policy that permits access via SSH. (Although CouchDB uses HTTP, you'll communicate with it through an SSH tunnel for simplicity's sake.) You'll also need to use a key pair. (If you need guidance, refer to the previous two articles in this series, "You can borrow EC2" and "Easy EC2.")

Once you've launched an EC2 instance of Ubuntu 9.04, you need to ssh to it. (Remember the instance might take a minute or so to boot up fully, so be patient.) For example, I can open up a terminal and ssh to the newly created instance like so:

aglover#> ssh -i .ec2/agkey.pem root@ec2-174-129-157-167.compute-1.amazonaws.com

The DNS name of my AMI was ec2-174-129-157-167.compute-1.amazonaws.com, and I'm referencing a key pair named agkey. Your DNS name and key pair will undoubtedly be different.

At the command prompt on the clouded Ubuntu instance, type:

apt-get update

Then type:

aptitude install couchdb

These commands automatically install CouchDB. However, note that they won't install the latest version. You need to install CouchDB from source if you want the very latest version (see Related topics).

Once the commands finish executing, you can check to see if CouchDB is running by issuing a ps -eaf command. Look for a few processes running with couchdb in their path by piping the ps output to egrep . You should see something along the lines of the output shown in Listing 1:

Listing 1. CouchDB is running (lines broken to fit article page width)

couchdb 1820 1 0 00:54 ? 00:00:00 /bin/sh -e /usr/bin/couchdb -c /etc/couchdb/couch.ini -b -r 5 -p /var/run/couchdb.pid -o / couchdb 1827 1820 0 00:54 ? 00:00:00 /bin/sh -e /usr/bin/couchdb -c /etc/couchdb/couch.ini -b -r 5 -p /var/run/couchdb.pid -o / couchdb 1828 1827 0 00:54 ? 00:00:00 /usr/lib/erlang/erts-5.6.5/bin/beam -Bd -- -root /usr/lib/erlang -progname erl -- -home /v couchdb 1836 1828 0 00:54 ? 00:00:00 heart -pid 1828 -ht 11

Next, back on your local machine, you'll set up an SSH tunnel that lets you access the CouchDB instance running on the cloud, as if it were residing on your own machine. To do so, open up a new terminal session on your local machine and type:

ssh -i your key -L 5498:localhost:5984 root@your AMI DNS

Finally, open up a browser on your local machine. In the location bar, type http://127.0.0.1:5498/ . You should see a nice welcome message in JSON, like this one:

{"couchdb":"Welcome","version":"0.8.0-incubating"}

Now that it appears things are working, you're ready to put CouchDB through its paces.

Working RESTfully with Groovy's RESTClient

REST Representational state transfer (REST) is a style of designing loosely coupled Web applications that rely on named resources — in the form of Uniform Resource Locators (URLs), Uniform Resource Identifiers (URIs), and Uniform Resource Names (URNs), for instance — rather than messages. Ingeniously, REST piggybacks on the already validated and successful infrastructure of the Web — HTTP. That is, REST leverages aspects of the HTTP protocol such as GET and POST requests. These requests map quite nicely to standard business-application needs such as create, read, update, and delete (CRUD).

Because CouchDB exposes data via a RESTful HTTP interface, working with CouchDB (as you've already seen via your browser) is quite easy. Pretty much everything you want to do can be done via HTTP.

You can choose among plenty of tools for interacting with HTTP. When working with RESTful interfaces, one of my favorites is the RESTClient extension to Groovy's HTTPBuilder (see Related topics). HTTPBuilder — a wrapper for the Apache Commons Project's popular HTTPClient — adds some slick Groovy-ness to the syntax of HTTP POSTs, GETs, PUTs, and DELETEs. Because HTTPBuilder is built with and leverages Groovy, writing scripts that leverage RESTful concepts (such as communicating with CouchDB) couldn't be easier.

Grape makes easy even quicker

In keeping with the general themes of Java development 2.0 — quick, easy, and free (or cheap) — Groovy's handy Grape (Groovy Advanced Packaging Engine or Groovy Adaptable Packaging Engine) feature is particularly relevant when it comes to interacting with a library like HTTPBuilder (see Related topics). Grape is a dependency manager that allows Groovy scripts and classes to autoconfigure their particular dependencies at run tieme. This makes using various open source libraries a breeze, because you don't need to download a series of JAR files just to start coding. For example, with Grape, you can write a Groovy script to use HTTPBuilder without having HTTPBuilder's required JARs beforehand. With Grape, they'll be downloaded (via Apache Ivy) at run time (or compile time).

You leverage Grape via annotations and method calls. You can, for example, decorate a method or class declaration with a @Grab annotation. In this annotation, you specify some relevant metadata regarding the main dependency. (Through the magic of Ivy, all transitive dependencies will be figured out too). At run time or compile time (whichever is first), Grape downloads these dependencies and ensures they're in your classpath. If the dependencies are already downloaded (from a previous run, for instance), Grape nevertheless still ensures the proper JAR files are in your classpath.

RESTing easy on CouchDB with Groovy

Before you can create any documents in CouchDB, you must create a database. To create a parking-tickets database, issue an HTTP PUT via HTTPBuilder's slick domain-specific language (DSL) using its RESTClient , as shown in Listing 2. (All the Groovy code for this article's examples is available for download.)

Listing 2. Creating a CouchDB database

import static groovyx.net.http.ContentType.JSON import groovyx.net.http.RESTClient @Grab(group='org.codehaus.groovy.modules.http-builder', module='http-builder', version='0.5.0-RC2') def getRESTClient(){ return new RESTClient("http://localhost:5498/") } def client = getRESTClient() def response = client.put(path: "parking_tickets", requestContentType: JSON, contentType: JSON) assert response.data.ok == true : "response from server wasn't ok"

CouchCB should return the response {"ok":true} . As you can see in Listing 2, in HTTPBuilder it's simple to parse JSON and ensure that the ok element's value is indeed true .

Next, it's time to create some documents in keeping with the parking-tickets theme. To model a parking ticket, remember that a number of aspects are associated with a ticket. Also keep in mind that because these are actual forms that officers complete, some fields might not be filled out or even follow a prescribed pattern — think intersection vs. exact location.

Using HTTPBuilder, you can create a document in CouchDB via an HTTP PUT (just as I did in Listing 2 to create the database). Because CouchDB works with JSON documents, you must follow JSON's name-value format. You do this by creating a map-like data structure in Groovy (which HTTPBuilder will transform into valid JSON). Listing 3 shows how:

Listing 3. Creating a CouchDB document via RESTClient

response = client.put(path: "parking_tickets/1234334325", contentType: JSON, requestContentType: JSON, body: [officer: "Kristen Ree", location: "199 Baldwin Dr", vehicle_plate: "Maryland 77777", offense: "Parked in no parking zone", date: "2009/01/31"]) assert response.data.ok == true : "response from server wasn't ok" assert response.data.id == "1234334325" : "the returned ID didn't match"

A few things are going on in Listing 3. First, when issuing a PUT for a CouchDB document, you must assign a UUID. CouchDB can assign these for you, or you can manage them yourself. In Listing 3, I've just made one up ( 1234334325 ); this UUID is consequently appended to the URL. If that UUID is available, CouchDB will assign the PUTed document to it. In the body aspect of my put call, note how each name has an associated value, almost like a normal map. For instance, the assigning officer's name is Kristen Ree , and the location of the ticket is 199 Baldwin Dr.

Listing 4 creates another parking ticket in CouchDB via the same technique:

Listing 4. Another parking ticket

def id = new Date().time response = client.put(path: "parking_tickets/${id}", contentType: JSON, requestContentType: JSON, body: [officer: "Anthony Richards", location: "Walmart Parking lot", vehicle_plate: "Delaware 4433-OP", offense: "Parked in non-parking space", date: "2009/02/01"]) assert response.data.ok == true : "response from server wasn't ok" assert response.data.id == "${id}" : "the returned ID didn't match"

Every time I issue a PUT via RESTClient , I assert that the JSON response contains a true value for ok , and I verify that my intended id value is present. Note how in Listing 4, rather than making up the UUID, I'm now using the current time — not a foolproof technique, but it'll suffice for simple interactions.

When you successfully create a new document in CouchDB, it responds with JSON containing the UUID and a revision ID. For example, this response represents the JSON that I'm validating in Listing 4:

{"ok":true,"id":"12339892938945","rev":"12351463"}

Your id and rev values will undoubtedly be different. Note that I can grab the id value by issuing a call such as response.data.id .

In CouchDB, documents are tracked via revisions, so you can go back to a previous document version (via the revision ID), much as you can in CVS or Subversion.

Views in CouchDB

Now that I've created a few parking tickets (or documents in CouchDB speak), it's time to create a view in CouchDB. Remember, views are just MapReduce functions in action; thus, you must define them. In many cases, you don't need the reduce function; the map function can handle most things for you. It does just what it sounds like. You can essentially map what "things" or aspects you'd like to filter or find, for example.

I've defined two tickets: one issued by Officer Ree and another issued by Officer Richards. To find all the tickets issued by Officer Ree, for example, you write a map function that filters the officer property accordingly. You then pass the results to CouchDB's emit function.

Using CouchDB's admin interface: Futon

You can define views via CouchDB's RESTful API or via CouchDB's administrative interface, dubbed Futon. Futon is just a Web application available at http://localhost:5498/_utils/. Go there now, and (assuming you've created the database and documents along with me) you should see a simple interface for the parking_tickets database, as shown in Figure 1:

Figure 1. The Futon interface

If you select the parking_tickets database, you can then see a drop-down list on the far right (dubbed Select view:). You start defining a custom view by selecting the Custom query..., as shown in Figure 2:

Figure 2. Futon's view-selection interface

Now Futon presents an interface that allows you to define both a map function and a reduce function. (You might need to click the View code link). In the Map text box, define the simple map function shown in Listing 5:

Listing 5. A simple map function in CouchDB

function(doc) { if(doc.officer == "Kristen Ree"){ emit(null, doc); } }

As you can see, the map function in Listing 5 is defined in JavaScript. All it does is filter the documents in the CouchDB database by a document's officer property. Specifically, the function passes a document to the emit only if the officer's name is Kristen Ree . Figure 3 shows where I've defined this function in Futon:

Figure 3. Creating a MapReduce function

Next, you're asked to provide a design document name (enter by_name ) and a view name (enter officer_ree ). These names will serve as a means to build a URL for invoking this view later (that is, http://localhost:5498/parking_tickets/_view/by_name/officer_ree).

You can now use this view via HTTPBuilder, as shown in Listing 6:

Listing 6. Invoking your new view

response = client.get(path: "parking_tickets/_view/by_name/officer_ree", contentType: JSON, requestContentType: JSON) assert response.data.total_rows == 1 response.data.rows.each{ assert it.value.officer == "Kristen Ree" }

This view correctly returns a JSON response containing only one document: the ticket issued by Officer Ree on January 31. The response object in Listing 6 hides the raw HTTP response by parsing the JSON accordingly. You can view the raw JSON response by calling the toString method on the data property of the response object. The raw response looks like Listing 7:

Listing 7. The view's raw result

{"total_rows":1,"offset":0,"rows":[ {"id":"1234334325","key":null, "value":{"_id":"1234334325","_rev":"4205717256","officer":"Kristen Ree", "location":"199 Baldwin Dr","vehicle_plate":"Maryland 77777", "offense":"Parked in no parking zone","date":"2009/01/31"}}]}

As you can see from the raw JSON document returned, HTTPBuilder's ability to parse JSON effortlessly is quite handy, because it enables an object graph-like mechanism for assessing various attributes and their corresponding values.

For demonstration purposes, I'll add some more documents to the database. In order to keep working through the examples, you should do the same using the code download.

CouchDB's emit function works as an organizer of sorts. If you don't put a restriction in your map function (as I did in Listing 5), then the emit will essentially sort the passed-in documents. For example, if you want to obtain all tickets by date (think SQL's ORDER BY clause here) you can just emit by the document's date fields, as shown in Listing 8:

Listing 8. A simpler map function

function(doc) { emit(doc.date, doc); }

Listing 9 issues an HTTP GET against this view (which I've given a design document name of dates and a view name of by_date ).

Listing 9. Another view invoked

response = client.get(path: "parking_tickets/_view/dates/by_date", contentType: JSON, requestContentType: JSON) assert response.data.total_rows == 4

The query in Listing 9 returns all the documents in the parking_tickets database sorted by date. The assert statement simply verifies that the total_rows property is equal to 4 . This is a key point. Views return results as well as a bit of metadata (such as the number of returned documents); thus, it helps to see the raw response before you start parsing away. Listing 10 shows the raw results:

Listing 10. Raw JSON documents sorted by date

{"total_rows":4,"offset":0,"rows":[ {"id":"85d4dbf45747e45406e5695b4b5796fe","key":"2009/01/30", "value": {"_id":"85d4dbf45747e45406e5695b4b5796fe","_rev":"1318766781", "officer":"Anthony Richards", "location":"54th and Main","vehicle_plate":"Virginia FCD-4444", "offense":"Parked in no parking zone","date":"2009/01/30"}}, {"id":"1234334325","key":"2009/01/31", "value": {"_id":"1234334325","_rev":"4205717256", "officer":"Kristen Ree", "location":"199 Baldwin Dr","vehicle_plate":"Maryland 77777", "offense":"Parked in no parking zone", "date":"2009/01/31"}}, {"id":"12345","key":"2009/01/31", "value": {"_id":"12345","_rev":"1479261876", "officer":"Anthony Richards","location":"1893 Main St", "vehicle_plate":"Maryland 4433-OP", "offense":"Parked in no parking zone","date":"2009/01/31"}}, {"id":"12339892938945","key":"2009/02/01", "value": {"_id":"12339892938945","_rev":"12351463","officer":"Anthony Richards", "location":"Walmart Parking lot","vehicle_plate":"Maine 4433-OP", "offense":"Parked in non-parking space", "date":"2009/02/01"}}]}

What's interesting about defining views like this is that you can then pass in a key— that is, what you'd like the emit function's first value essentially to represent. For example, the view defined in Listing 8 essentially sorts by date. If you'd like to sort it by a specifeic date, then pass that date into the view query. For instance, just for fun, enter this URL in your browser's location box:

http://localhost:5498/parking_tickets/_view/dates/by_date?key="2009/01/31"

This view then returns only the tickets issued on January 31. You should see a bunch of JSON-looking text in your browser window similar to what you can see in Listing 11. Notice that using your browser as your query tool is an especially easy way to view the raw JSON response of an HTTP request.

Listing 11. Only two tickets issued on January 31

{"total_rows":4,"offset":1,"rows":[ {"id":"1234334325","key":"2009/01/31", "value": {"_id":"1234334325","_rev":"4205717256","officer":"Kristen Ree", "location":"199 Baldwin Dr","vehicle_plate":"Maryland 77777", "offense":"Parked in no parking zone", "date":"2009/01/31"}}, {"id":"12345","key":"2009/01/31", "value": {"_id":"12345","_rev":"1479261876","officer":"Anthony Richards", "location":"1893 Main St","vehicle_plate":"Maryland 4433-OP", "offense":"Parked in handicap zone without permit", "date":"2009/01/31"}}]}

Views can be as specific as you'd like. For instance, with a little JavaScript string manipulation, I can write one that finds tickets issued anywhere on Main Street, as shown in Listing 12:

Listing 12. Another view with some string magic

function(doc) { if(doc.location.toLowerCase().indexOf('main') > 0){ emit(doc.location, doc); } }

As you can see from Listing 12, if the location element of any document contains main , then the document is passed to the emit function. Keep in mind that this search is rather wide. If a document's location contains a string such as Germaine Street , it'll be returned too. For the small population of tickets I've defined, the view would return the results shown in Listing 13:

Listing 13. Results filtered by Main Street

{"total_rows":2,"offset":0,"rows":[ {"id":"123433432asdefasdf4325","key":"4th and Main", "value": {"_id":"123433432asdefasdf4325","_rev":"498239926", "officer":"Chris Smith","location":"4th and Main", "vehicle_plate":"VA FGA-JD33", "offense":"Parked in no parking zone","date":"2009/02/01"}}, {"id":"123433432223e432325","key":"54 and Main", "value": {"_id":"123433432223e432325","_rev":"841089995", "officer":"Kristen Ree","location":"54 and Main Street", "vehicle_plate":"Maryland 77777", "offense":"Parked in no parking zone","date":"2009/02/02"}}]}

Note that the JSON response contains a key element, which describes why a particular document was emitted. This level of information is quite helpful. Also note how all along, the data found in the various tickets I've defined is somewhat inconsistent: some locations are precise, others aren't. Although this data could be stored in a relational database, it fits well with the documented-oriented model too, don't you think? Plus, with the power of Groovy and HTTPBuilder's ability to parse JSON effortlessly, it's quite easy to get at the data (much easier than with raw JDBC).

CouchDB as the database for the Web

CouchDB is especially interesting because it's so easy to get going with it. Relational databases are easy too, but what's nice about this database is how you can embrace its API if you already have some familiarity with, say, using a Web browser. What's more, because of CouchDB's RESTful API, you can communicate with it via cool frameworks like HTTPBuilder's RESTClient . You aren't limited to HTTPBuilder either; a number of Java libraries try to make working with CouchDB easier. One that is especially promising is jcouchdb (see Related topics), which shields you from the RESTful-ness and JSON-ness of it all and allows you to work programmatically in the Java language with documents and views.

Stay tuned for next month's column, where I'll return to Google App Engine. True to the spirit of open innovation, new frameworks are popping up that facilitate Google App Engine development and deployment. You'll see how one of them makes Java development 2.0 on Google's cloud platform even easier.

Downloadable resources

Related topics