We often describe Couchbase Server as a document database. That's a good description: Couchbase performs superbly as a JSON document store.

When modelling our data for Couchbase Server, though, it can help to think of it more like a key-value store.

Key-value thinking versus document thinking

So, what's the practical difference between key-value stores and document stores? Mostly, it's about how you query the data.

Key-value databases tend to store data as opaque units, giving you one index: the key itself. You store and retrieve data as discreet chunks using only the key. That's usually fast because it's uncomplicated but it's also a pretty blunt tool.

Document stores, on the other hand, give you additional indexes on the data inside the document and so you have much greater freedom in how you query that data. Of course, query in any database system comes with a resource cost.

When designing our data model, we need to choose which approach to take. To do that we need to understand the trade-offs.

The trade-offs

Couchbase is equally at home giving you key-value access and document query-style access to your data.

Both approaches have their advantages and their trade-offs:

Key-value access advantages Document-style access advantages Super-fast: sub-millisecond responses Flexibility: easily create new indexes any time Immediately consistent within the cluster Insight: create views that deal with changing data and can provide analytical insight rather than just another index Tiny resource impact See inside the document: create indexes on KV pairs inside your JSON

With Couchbase views today and soon N1QL, querying in Couchbase gives you enormous flexibility. Once N1QL is generally available, and the indexing to support it, perhaps much of this blog will be obsolete.

However, right now we can keep all the benefits of Couchbase's key-value model – immediate consistency, cached sub-millisecond response times, linear scaling profile, etc – without giving up much query flexibility. The way we do that is to create manual secondary indexes.

It's all about the look-up

Imagine we're storing user profiles in Couchbase. The first choice we need to make is how we key them.

Let's say our users log-in to our system using their email address. That means that, at the minimum, when someone logs in we know that one thing about them. If we key our user profiles by email address, then we get a log-in process something like this:

User enters her email address (e.g. lily@example.com) and password into the log-in form. We GET the document from Couchbase that has the key lily@couchbase.com We verify the password against the hash in the user profile. If successful, we complete the log-in and Lily goes about her business.

Great, that was easy.

After a while, Lily changes her email address and wants to update it in our system. Naturally, she'll want to use her updated email address to log in.

We have three options for how to handle this:

create an entirely new user profile document, keyed by the new email address, and destroy the old one

create a redirect document

from the beginning, use look-up documents to create a manual secondary index.

The first option seems a little inelegant. For example, if we had been referring to the document elsewhere in our system then those references would now be dead-ends.

Redirects

We could, instead, take the second option and create a new document keyed by the new email address. The document contents would simply be the old email address. Of course, we'd also need to place the new email address inside the user profile document itself so that we wouldn't need the look-up document to associate that email address with this user.

Now our login process would look like this:

User enters her email address (e.g. lily@newdomain.com) and password into the log-in form. We GET the document from Couchbase that has the key lily@newdomain.com. We see that the document is another email address (lily@example.com), rather than a full user profile. We GET the document lily@example.com. We verify the password against the hash in the user profile. If successful, we complete the log-in and Lily goes about her business.

This could work. It gives us a little complication, though: we no longer know what we'll receive when we do a GET on an email address.

Instead, we could start out using look-up documents from the very beginning.

Using a manual secondary index

It's quite likely that some portion of our userbase will change their email address during their account's lifetime. So, it makes sense for us to handle this probability right from the beginning.

Rather than key our users' profile documents with their email addresses, we should key them with something else unique that'll remain constant. As we're looking for something unchanging, the key itself should ideally be unrelated to anything about the users themselves. Couchbase Server gives us an easy way to handle this: atomic counters.

We can call atomic counters with either an increment or a decrement, plus an amount, and then we get the resulting number back. If we increment the counter by one each time we create a new user profile, that gives us a unique and unchanging key.

Let's look at how it works:

Our user completes the sign-up process. We increment the counter and it gives us back 1001. We create our user profile document using the key 1001.

Now changes in the user's profile do not affect the key. There's a problem, though: unless we want to make our users memorise a numeric username – such as 1001 – then we have to find another way of matching a friendly username with the user's profile.

That's where our manual secondary index comes in. Let's add another step to our sign-up process:

We create a look-up document keyed on the user's email address, with a value of 1001.

Now, our log-in process looks like this:

User enters her email address (e.g. lily@newdomain.com) and password into the log-in form. We GET the document from Couchbase that has the key lily@newdomain.com. We see the value of that document is 1001, so we GET the user profile document with the key 1001. We verify the password against the hash in the user profile. If successful, we complete the log-in and Lily goes about her business.

With most database systems, that could introduce an unacceptable lag but with Couchbase the additional look-up should be sub-millisecond. That opens up a whole range of other manual indexes we could introduce: Twitter handles, phone numbers, cities and so on.

Of course, in introduces a little more work in the application layer but, in return, we get the flexibility of secondary indexes while retaining all the speed and scalability that made us choose Couchbase Server in the first place.

Next time I'll be looking at key naming.