Spiegel was originally designed to provide scalable replication and change listening for Quizster, a photo-based feedback and submission system. Spiegel is now an open source project available to everyone.

So, why do we need Spiegel? The short answer is that without it, you’ll have a hard time scaling your CouchDB replications and change listening as your user base grows. Let’s take a closer look.

Replication Challenges in CouchDB

Scalable Replication

The _replicator database in CouchDB is a powerful tool, but in many cases it does not scale well. Consider the example where we have users posting blog entries. Let’s assume that we want to use PouchDB to sync data between the client and CouchDB. Let’s also assume a design of a DB per user and an all_blog_posts database that stores the blog posts from all the users. Having a database per user will allow us to restrict access to the user databases so that only the owner of a post can edit her or his posts. In this design, we’d want to replicate all our user databases to the all_blog_posts database. At first glance, the obvious choice would be to use the _replicator database to perform these replications, but the big gotcha is that continuous replications via the _replicator database require a dedicated database connection. Therefore, if we had 10,000 users then we would need 10,000 concurrent database connections for these replications, even though at any given time there may be at most 100 users making changes to their posts simultaneously. With Spiegel, we can prevent this greedy use of resources by only replicating databases when a change occurs.

Real-Time Replication Between Clusters

While CouchDB 2 has built-in clustering, one limitation is that this clustering isn’t designed to be used across regions or data centers. Spiegel tracks changes in real-time and then only schedules replications for databases that have changed. You can therefore use Spiegel to efficiently keep clusters located in different regions of the world in sync.

Scalable Change Listening

Let’s assume that we have some routine that we want to run whenever there are changes, e.g. we want to calculate metrics using a series of views and then store these metrics in a database doc for quick retrieval later. We’d need to write a lot of boilerplate code to listen to _changes feeds for many databases, handle fault tolerance, and support true scalability. Instead, we can define a custom REST API endpoint that calculates these metrics and then a Spiegel on_change rule that will call this endpoint whenever there are applicable changes.

How Spiegel Scales

Spiegel is comprised of three types of processes: the update-listener, change-listener, and replicator. The update-listener listens to the _global_changes database and then schedules on_change rules and replications accordingly. The change-listener runs on_change rules for all matching changes. The replicator performs replications.

Diagram of Spiegel’s update-listener, change-listener, and replicator processes.

You can run as many update-listeners, change-listeners, and replicators as your CouchDB cluster can handle. In addition, you can fine tune things like the concurrency and batch sizes so that you don’t exhaust your CouchDB resources. In most cases you’ll want to run at least two of each of these processes for redundancy. In general, if you need to listen to more changes or respond to these changes faster, add a change-listener. Similarly, if you need to perform more replications or replicate faster, add a replicator.

There is an official docker image that you can use to run the different Spiegel processes, and you can use Docker Swarm or Kubernetes to easily scale your instances.

In a recent passion talk at Offline Camp Oregon, I shared more on Spiegel’s scalable replication and efficient change listening:

Geoff Cox presents “Scalable CouchDB Replication and Change Listening with Spiegel” at Offline Camp Oregon, November 2017

To get started using Spiegel, explore the repo and check out my step-by-step tutorial:

Happy replicating!

About the Author

Geoff Cox is the Co-Founder of Quizster, a photo-based submission and feedback system. Quizster uses a full stack of JS and runs CouchDB and PouchDB at the data layer.