The NoSQL Partition Tolerance Myth

I've been reading a lot of NoSQL marketing material lately, and it's impossible to avoid noticing how many NoSQL data stores talk about their "partition tolerance." I want to bust three common myths about this buzzword:

What the NoSQL industry is peddling under the guise of partition tolerance is not the ability to build applications that can survive partitions. They're repackaging and marketing a very specific, and odd, behavior known as partition obliviousness .

. Partition obliviousness is not a universally desirable property. It doesn't actually help you write partition tolerant applications. It often hinders the construction of robust applications. You probably don't want it.

Partition obliviousness is not difficult to achieve, nor is it something to be proud of.

Myth #1. In distributed systems literature, partition tolerance refers to a system's overall ability to live up to its specification in the presence of network partitions. So, if I'm implementing an online bank, and my datacenter network gets divided down the middle into two halves, my bank should continue to correctly debit your account and credit mine. If it's not doing this, then the bank suffers downtime, which is bad for business.

Now, it's actually very difficult to build partition tolerant services. Think about what a bank could possibly do to facilitate a transfer if my record is on this side of the chasm and yours is on the other; namely, not all that much. In such cases, it is almost always better to build services that degrade gracefully under partitions. In this case, we would want failure detection and carry out those transfers where the accounts are both on the same side of the partition, while denying or deferring transfers that cross the chasm. If we do perform this failure detection and defer operations that cannot be satisfied because data is not available, sure, we lose some business, but at least we have a mostly working bank. We're not bankrupt, we didn't mess anything up, we're still a legitimate bank and we'll live to transact another day.

But what most NoSQL systems offer is a peculiar behavior that is not partition tolerant, but partition oblivious instead. If I were to build a bank based on Dynamo, the granddaddy of all first-generation NoSQL data stores, it would silently split into two halves, like a lobotomized patient. Each of those two halves will then happily start serving data as if nothing happened, even though the worst possible kind of calamity just struck the datacenter. In this scenario, the hypothetical backend for Banko Dynamo would not only not provide any indication of failure, but allow a customer to create as many new accounts as there are partitions, one in each. You decide how great a feature this is and whether you want it. Perhaps you like manually reconciling diverged objects. Or perhaps you're too big to fail and Uncle Sam will make your customers whole.

Myth #2. Partition obliviousness is a very odd and quirky property, one that most people would not want if it were branded properly. Imagine that we're doing a startup to topple the Great Satan known as Ticketmaster. We've built a buzzword-complete website with all the latest technologies, backed by a first generation NoSQL store that provides partition oblivious operation.

Now assume that we have one of these partitions that the NoSQL salesmen are constantly talking about. The kind of partition that divides my datacenter down the middle and yet both halves are getting requests from the outside world. Do you really want to sell tickets from both halves of your system? By definition, there is no way you can guarantee uniqueness of those tickets. There will be customers holding identical tickets with identical seat numbers. If this is the kind of behavior you want, then great, because that's exactly what you'll get with partition obliviousness.

NoSQL salesmen will tell you that they implement "healing." This refers to a simple process where, after the partition heals, they check to see if the objects diverged in the two halves. And if they did, the first-generation NoSQL stores usually take the ultimate punt by presenting all versions of the divergent objects to the application, and let the application resolve the mess. So, after the partition heals, and you figure out that you sold the same seat to Mr. Pink and Mrs. White, now what? They both printed their tickets long ago, donned their concert outfits and are at the coat check, moments away from a very unpleasant realization that they need to squeeze into the same seat. If instead of Ticketmaster, you were implementing an EBay clone, who won the auction and bought the last Furby? If you're running analytics, what happened to all your data from, say, the southern states, and how do you think the transparently missing data will torque your business purchasing decisions? You might be able to explain to your boss why your analytics ordered no grits this month, perhaps, but try explaining the quirky semantics for your brand of "partition tolerance" to crying kids who wanted that Furby. The data store may have tolerated that partition, but your application didn't; things got messed up.

Of course, not every application resembles these e-commerce and analytics applications. If your NoSQL store only holds soft, inconsequential data, or the objects cannot possibly ever diverge, or the data store is write only, then it's absolutely not a problem to have multiple copies of the object evolve separately in each partition. If your service is embarrassingly parallel or embarrassingly replicable, first generation NoSQL stores are perfectly fine. But if your data is that soft and inconsequential, why not just use memcached? It's wicked fast, far faster than Mongo.

Myth #3. A lot of NoSQL developers pretend that being partition oblivious is a difficult thing to implement. This is false. It's easy to make a program oblivious to a particular event; namely, you write no code to handle that event. If, by some mistake, you had an intern write some clever code for detecting partitions, you actively delete it. There, your system is now oblivious. Like every geek, I was oblivious to a lot of signals throughout high school; it came to me naturally, I had to undergo no special oblivion training, and, most importantly, and I cannot emphasize this enough, especially in hidsight about all the fun opportunities I missed: missing those signals wasn't a virtue.

The thing that greatly helps first generation NoSQL data stores, the thing that enables them to package partition obliviousness as if it were equivalent to partition tolerance, is that they provide a very weak service guarantee in the first place. These systems cannot guarantee that, on a good day, your GET will return the latest PUT. In fact, eventual consistency means that a GET can return any previous value, including the Does Not Exist response from the very initial state of the system. Therefore, it's not a violation of their interface if, during and after a partition, they exhibit total amnesia.

And that brings us back to the beginning, to the definition of partition tolerance. These systems are so weak that they can rebrand their oblivious behavior as partition tolerant -- it's not as if they are ever violating their own weak specification by returning bogus responses, for they gave you no guarantee in the first place. Most importantly, the data store itself may operate through partitions, but that does not mean it enables you to build applications on top that can do the same. If your application needs to uphold a strong service guarantee, these weak data stores will not necessarily help you provide that guarantee through a partition.

Luckily, we know how to build stronger systems that provide better guarantees, and there are now second-generation NoSQL systems that do not suffer from these problems.

--

Share on Twitter Share on Twitter

Share on Facebook Share on Facebook

Share on Linkedin Share on Linkedin

Share on Reddit Share on Reddit

Share on E-Mail Share on E-Mail

Please enable JavaScript to view the comments powered by Disqus.

Disqus