Some misconceptions about the CAP Theorem

Posted on September 29, 2010 by Tommy McGuire

When designing distributed web services, there are three properties that are commonly desired: consistency, availability, and partition tolerance. It is impossible to achieve all three. In this note, we prove this conjecture in the asynchronous network model, and then discuss solutions to this dilemma in the partially synchronous model.

...as far as I can tell, there is no practical difference between CA systems [consistent and available, but not tolerant of partitions] and CP systems [consistent and tolerant of network partitions, but not available]. As noted above, CP systems give up availability only when there is a network partition. CA systems are “not tolerant of network partitions”. But what if there is a network partition? What does “not tolerant” mean? In practice, it means that they lose availability if there is a partition. Hence CP and CA are essentially identical. So in reality, there are only two types of systems: CP/CA and AP.

In an available but not partition-tolerant system, Y would be allowed to process its request because it can reach X to obtain the lock. Z’s request would be blocked because X is unreachable.

In a partition-tolerant but not available system, Z would be allowed to process its request because it is part of the quorum group (X’s lock will be broken). Y’s request would be blocked because it is not part of the quorum group.

In a system that is both available and partition-tolerant, both requests would be allowed to progress. Y would return current data as possibly modified by X, while Z would return possibly stale data. Consistency is obviously sacrificed in this case. Note that maintaining consistency is possible in the other two kinds of systems (which is not to say that it’s easy).

3.2.2 Atomic, Available [G&L's terms for CA]

If there are no partitions, it is clearly possible to provide atomic [consistent], available data. In fact, the centralized algorithm described in Section 3.2.1 [Atomic, Partition Tolerant; i.e. CP] meets these requirements.

You're a daisy if you do

...there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. This is equivalent to requiring requests of the distributed shared memory to act as if they were executing on a single node, responding to operations one at a time.

...every request received by a non-failing node in the system must result in a response.

...the network will be allowed to lose arbitrarily many messages sent from one node to another....(And any pattern of message loss can be modeled as a temporary partition separating the communicating nodes at the exact instant the message is lost.)

The atomicity requirement therefore implies that every response will be atomic, even though arbitrary messages sent as part of the algorithm might not be delivered. The availability requirement implies that every node receiving a request from a client must respond, even though arbitrary messages that are sent may be lost.

...there is no clock, and nodes must make decisions based only on the messages received and local computation;

...every node has a clock, and all clocks increase at the same rate....Further, assume that every message is either delivered within a given, known time: tmsg, or it is lost.

Skin that smoke wagon

Corollary 1.1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:

Availability, in all fair executions,

Atomic consistency, in fair executions in which no messages are lost.

I'm your huckleberry

Theorem 1 It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties:

Availability

Atomic consistency in all fair executions (including those in which messages are lost).

Theorem 2 It is impossible in the partially synchronous network model to implement a read/write data object that guarantees the following properties:

Availability

Atomic consistency in all executions (even those in which messages are lost),

In this note, we have shown that it is impossible to reliably provide atomic, consistent data when there are partitions in the network. It is feasible, however, to achieve any two of the three properties: consistency, availability, and partition tolerance.

You may indeed, if you get lucky.

Maybe poker's just not your game. I know: let's have a spelling contest

I feel some trepidation in writing this section, since in the previous I have presented only informal proof sketches myself. However, I feel it needs to be said, even if I do not currently have the tools to follow my own advice.

With the exception of corollary 1.1, I do not have any specific issues with the proofs from Gilbert and Lynch. In fact, the proof sketches I have presented above are not materially different from their proofs. However, the primary general problem I have with their paper, and the causative factor I see with corollary 1.1, is that none of their proofs are in any way formal. Let me emphasize that, for anyone who has ever referred to "Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services" as a formal proof of the CAP theorem:





This paper is in no way formal.

There are no formal proofs here.

The parrot is dead.





If Gilbert and Lynch had attempted a formal proof, the weakness of corollary 1.1 would have been evident. In my opinion, the attempt at formalizing the proofs of the two theorems would have resulted in definitions of consistent, available, and partition tolerant much clearer than those actually presented. And further, the overall failure to demonstrate the CAP theorem, rather than a subset of it, should have been clear.





It appears my hypocrisy knows no bounds





Now that I have presented what I believe to be a better proof of the CAP theorem, what have I gained? One question anyone might ask is, if you weaken one or more of the two guarantees, can you get back some of the third? I (and presumably everyone else) believe so. And with theorems 1, 2, and 3, I have some guidance in doing so.





Theorem 5: \(\tilde{C}P \wedge \tilde{A}\)





Assume that a distributed storage protocol over nodes \(M_0,\ldots,M_{n-1}\) guarantees write serial consistency and partition tolerance. Such a protocol can guarantee read availability.





Define write serial consistency by associating a witness with the value maintained by the distributed data store such that the witness begins at 0 with the value at its initial state and is incremented by 1 every time the value is written. A read request results in the most recent value and current witness value. A write request results in an ack and a witness value \(w+1\) where \(w\) is the witness associated with the immediately previous operation.





Define read availability such that if a node \(M_i\) receives a read request, it sends a response within some bounded period of time. No guarantee is made about write requests.





You called down the thunder

Doc Holliday: What did you ever want? : What did you ever want?

Wyatt Earp: Just to live a normal life. : Just to live a normal life.

Doc Holliday: There's no normal life, Wyatt, it's just life. Get on with it. : There's no normal life, Wyatt, it's just life. Get on with it.

Wyatt Earp: Don't know how. : Don't know how.

Comments

Please enable JavaScript to view the comments powered by Disqus.

I finally broke down and read Eric Brewer's original " Towards Robust Distributed Systems " slides from PODC 2000 and Seth Gilbert and Nancy Lynch's " Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services " from the ACM SIGACT News, 2002. I can now understand some of the hubbub around the use of this theorem. Unfortunately, I believe there are some misconceptions with the content of the paper.for people reading this blog, I apologize if I drop into full-on academic-ese. It's an occupational hazard; dogs, fleas, that sort of thing. Also, Gilbert and Lynch use the term "algorithm", while I prefer "network protocol". I have used both terms interchangeably.In their abstract, Gilbert and Lynch describe their contribution:Ultimately, the conjecture and theorem assert thatof the properties of consistency, availability, and partition tolerance can be achieved in a system, which leads to the question of what those combinations mean. Daniel Abadi describes one difficulty with this theorem Jeff Darcy presents an interesting analysis of the CAP theorem in terms of quorum-based distributed systems (where node X holds the current version of the data), in which he says,Unfortunately, in this analysis, CA and CP differ in which of the requests Y or Z is blocked; i.e.the system becomes unavailable.Indeed, in their paper Gilbert and Lynch admit to some similarity between CA and CP:Dan Weinreb wrote an excellent discussion of the definitions used by Gilbert and Lynch but, after reading the paper, I think the difficulties go deeper. However, before I can examine those difficulties in the formal contents of the paper, I must start by noting some of the definitions used by Gilbert and Lynch.One difficulty many readers seem to have with the paper lies in the specific definitions of, andused by Gilbert and Lynch., the C in CAP, is defined in section 2.1 by Atomic Data Objects, asOne note for anyone applying the CAP theorem to data storage systems, such as relational databases, NoSQL, ACID, BASE, pre- and post-relational data stores, object stores, etc., etc.: this,consistency criteria is significantly stronger than SQL'sconsistency, or the commonly available, weaker criteria like read-committed and so forth. If you are trying to argue that, for example, an off-the-shelfsatisfy your availability requirements, then this proof does not directly help your argument., the A, is defined in section 2.2 asAvailability, in this paper, has a very specific, very technical meaning. In particular, a request which is not received by anode does not impact availability.Finally, a networkis defined in section 2.3 as, the P, is strangely defined in terms of consistency and availability:The proofs in the paper are based on two protocol models:, in whichand, in whichBefore I look at the more important parts of the paper, I would like to examine perhaps the strongest, albeit minor, result from the paper,The proof of the corollary proceeds by contradiction and by noting that an asynchronous algorithm has no way of determining whether a message has been lost; therefore an algorithm which made those guarantees under an assumption that no messages were lost would also make those guarantees if messages were lost, contradicting theorem 1 of the paper.This claim is very surprising in light of the fact that, assuming no messages are lost, it is relatively simple to exhibit a protocol whichguarantee availability and atomic consistency.The proof of corollary 1.1 begins by assuming an algorithm A which makes those guarantees in all non-partitioned executions (in other words, no messages are lost). By theorem 1, there must be some execution α in which some response is not atomic (and presumably from which some message is lost), from which a prefix α' is extracted, up to the point where the non-atomic response is received (presumably including the message loss, which would cause the error). The execution prefix α' is extended to a complete execution α'' in which all messages are delivered eventually (including the message which was lost in execution α; in α'' this message was merely delayed). The execution α'' therefore contradicts the guarantees.The problem with this proof, aside from its dizzying structure, seems to be that it is missing an analysis of a case: it proceeds by assuming that if A produces an erroneous execution, it does so by producing a non-atomic response. However, the protocol which I would exhibit (guaranteeing consistency and availability in non-failing executions) would violate availability upon a failing execution, not consistency. It would just halt, precluding the identification of the prefix α' and the construction of the contradictory execution α''.The paper has two primary theorems,and a corresponding theorem 2, for partially synchronous networksThe proofs of both of these theorems presented by Gilbert and Lynch are very similar: an algorithm A is assumed to exist which satisfies all of the properties. Then, an execution of A is constructed that includes an inconsistent response, which contradicts the original assumptions. In both cases, the network consists of two disjoint, non-empty sets of nodes, G1 and G2, which are further partitioned so that no communication is possible between G1 and G2. The constructed execution involves a write to G1 followed by a read from G2, which cannot be consistent with the write.Unlike corollary 1.1, I do not see any particular problem with these proofs as far as they go, but I do believe the result has been overstated. In the conclusion, Gilbert and Lynch say,The first sentence adequately describes the theorems in the paper, but it does not imply the second. Specifically, both of the proofs assume availability as a priority requirement and make use of a partition that leaves the system unable to maintain consistency.I find these proofs to be unsatisfactory for a number of reasons. The definitions and proofs, as given, are excessively complex and result in frequent misunderstandings. Further, the goal as stated in the abstract and conclusion is not satisfied. I believe the goal can be satisfied and that doing so is important. Finally, I believe a clearer presentation would open up several further avenues into the capabilities of distributed systems.I intend to provide an alternative proof (or, to be more correct, a proof sketch) of the CAP theorem. This proof will proceed in three parts: first, a demonstration that a consistent and available protocol cannot withstand partitions; second, that a consistent and partition tolerant protocol cannot guarantee availability; third, that an available and partition tolerant protocol cannot guarantee consistency. Before I can get to the proofs, I must revisit the definitions.The execution model is the same as Gilbert and Lynch'smodel: all nodes have a clock which may not show the same time as any other node, but which advances at the same rate. This model is more general than the asynchronous model and is more useful.A distributed storage system is made up of, labeled \(M_0\) to \(M_{n-1}\), where \(n > 0\). This storage system maintains a single value, an integer.There are two possible messages which can be sent to a node \(M_i\):, requesting the current value, and, updating the current value. There are two possible response messages of \(M_i\) to a request:containing the current value in response to a read request, and, indicating that the write request has updated the value. I explicitly do not include an error response; such a response would be exactly identical to no response at all in terms of violating availability. (In practice, such a response would be a requirement.)If a protocol guarantees, then if a request message is received by any node \(M_i\), the result of that operation is based on the latest value immediately prior to that operation. A read request results in the most recent value; a write request overwrites the most recent value with the new value. In order to demonstrate this consistency, I associate a witness with the value maintained by the distributed data store such that the witness begins at 0 with the value at its initial state, and is incremented by 1 every time the value is read or written. As a result, a read request results in the most recent value and a witness value \(w+1\) where \(w\) is the witness associated with the immediately previous operation; likewise, a write request results in an ack and a witness value \(w+1\) where \(w\) is the witness associated with the immediately previous operation. I am maintaining the Gilbert and Lynch's requirement for serial consistency.If a protocol guarantees, then if a node \(M_i\) receives a request, it sends a response within some bounded period of time; a read request results in a value and a write request results in an ack. An unbounded delay results in an availability failure, as would some hypothetical error response. The actual bound must be protocol dependent. What is explicitly disallowed is a delay until the partition is healed.If a protocol guarantees, then the loss of one or more messagesThe proof of this theorem follows the same lines as the proofs of Gilbert and Lynch.Assume that the total cluster of nodes is divided into two, non-overlapping, non-empty sets \(M_0,\ldots,M_i\) and \(M_{i+1}, \ldots ,M_{n-1}\), that the current witness value is \(w\), and that all messages from members of the first set do not reach any member of the second set, and vice versa. Further, assume that a request is received by \(M_j\), one of \(M_0, \ldots, M_i\), which responds appropriately with a witness \(w+1\) and that the protocol is otherwise quiescent. If the next request is received by \(M_k\), one of \(M_{i+1}, \ldots, M_{n-1}\), after \(M_j\) responds to the first request, then \(M_k\) should respond appropriately with a witness \(w+2\) by virtue of its availability and consistency. However, it cannot do so consistently since it does not have the most recent version of the value and does not know that the witness has been incremented. As a result, this protocol cannot tolerate the partition without invalidating some other guarantee.Assume that the total cluster of nodes is divided into two, non-overlapping, non-empty sets \(M_0, \ldots, M_i\) and \(M_{i+1}, \ldots, M_{n-1}\), that this cluster is otherwise quiescent, and that all messages from members of the first set do not reach any member of the second set, and vice versa. If some node \(M_j\) in \(M_0,\ldots,M_i\) receives a request and some node \(M_k\) in \(M_{i+1},\ldots,M_{n-1}\) receives another request then one of the responses should have a witness of \(w+1\) and the other should have a witness of \(w+2\). However, because there is no communication between the two sets, they will not be able to agree on the witnesses; the set which should assign witness \(w+2\) can have no idea that witness \(w+1\) has been assigned. In order to avoid inconsistency, one or the other of \(M_j\) and \(M_k\), or both, must not generate any response, therefore violating availability.Assume that the total cluster of nodes is divided into two, non-overlapping, non-empty sets \(M_0,\ldots,M_i\) and \(M_{i+1},\ldots,M_{n-1}\), and that all messages from members of the first set do not reach any member of the second set, and vice versa. If some node \(M_j\) in \(M_0,\ldots,M_i\) receives a request and some node \(M_k\) in \(M_{i+1},\ldots,M_{n-1}\) receives another request then one of the responses should have a witness of \(w+1\) and the other should have a witness of \(w+2\). However, because there is no communication between the two sets, they will not be able to agree on the witnesses; the set which should assign witness \(w+2\) can have no idea that witness \(w+1\) has been assigned. In order for the cluster to preserve availability, both \(M_k\) and \(M_j\) must generate responses in some bounded time while the duration of the partition may not be bounded. In these responses, both witnesses will have some arbitrary value (such as both having \(w+1\)). In order to preserve availability, therefore, this protocol must violate consistency.Theorem 4 follows from theorems 1, 2, and 3.Suppose that the total cluster of nodes is divided into two, non-overlapping, non-empty sets \(M_0,\ldots,M_i\) and \(M_{i+1},\ldots,M_{n-1}\), that the current witness value is \(w\), and that all messages from members of the first set do not reach any member of the second set, and vice versa. Further, suppose that a request is received by \(M_j\), one of \(M_0,\ldots,M_i\). If the request is a write, assume that an error is generated, it is delayed until the partition is healed, or that it is simply ignored. If the request is a read, assume that \(M_j\) responds with the value and witness \(w\). This protocol can support all three of write serial consistency, read availability, and partition tolerance.In this post I have pointed out some issues that I believe exist in Gilbert and Lynch's paper and presented what I believe to be improvements. I have also demonstrated that, while it is impossible to achieve all three of (serial) consistency, (absolute) availability, and (unconditional) partition tolerance, it is possible to achieve two. Also, by weakening one or both, it is possible to partially approach the third.I highly recommend http://danweinreb.org/blog/what-does-the-proof-of-the-cap-theorem-mean which has a similar theme. People think Gilbert and Lynch proved Brewer's theorem, but they really proved something subtly different and generally less applicable to the contexts in which CAP is mentioned.Jeff Darcy2010-10-01T09:59:21.725-05:00'In short, it is stupid to map one model to another model without justification. Mathematics is a great way to do this. What mathematical tools do we have at our disposal?Likewise, this question can be asked (and has been asked, and is being asked) at the language-level. Hewitt's ActorScript ideas are important here. So is the blame calculus, since static languages interfacing with dynamic ones is really just a way to define partitions. Wadler has linked to an explanation of the CAP Theorem on his blog before, but the description he picked wasn't that great.I also think most people misunderstand the differences between agent-based computing and Basic Stuff Required For the Critical Path of a Business(TM). Computing with agents is great, and can be used to reduce costs and/or service customers faster/better/cheaper. But it is not a replacement for underlying social traditions and the expectations of members of society. Peter Norvig said in Coders At Work that many of Google's unit tests don't have concrete internal pass/fail checks, and the test is more on a continuum. I can imagine that for a database search engine where half a second increase in latency drops traffic by one-fifth! Better to give suboptimal search results than to have users use Bing instead; all that matters is the results are accurate enough in relation to some function that measures AdWords money generation for Google. The only flaw in such unit tests is they fundamentally can't model second-order effects.Building programming models based on societal traditios accumulated since the stone age is why Hewitt detoured so heavily into theories of organizations, because he believed mathematical models should reflect the real world -- he was especially critical of petri nets modeling due to its inability to model real-world processes.Jeff, I had missed that particular post by Dan Weinreb, although I have a browser tab open on another one of his posts. I'll add a link to it.I am tempted to say, "if I'd seen that, I wouldn't have spent my three-day weekend writing my post," but that is not precisely true. The real irritant for me was corollary 1.1, which was like changing from fifth gear to first at 60mph on my old Kawasaki: I didn't necessarily know what, but I knew something was very wrong. Only after that did I really dig in and notice that the paper did not actually prove what it claims to have proven.It is true that the CAP theorem is both subtly different from what most people would believe and also not directly applicable to some of the contexts in which it is used, but I would go further and say that Gilbert and Lynch have only partially proved the CAP theorem, even given their somewhat convoluted definitions.Tommy McGuire2010-10-05T17:23:02.418-05:00'Z-Bo, it's good to hear from you!I'm afraid I do not really understand your comments. Could you elaborate?Tommy McGuire1. Phil Wadler has been developing a distributed programming language called Links that allows programmers to treat n-tier applications as a single tier, and then dynamically split the application across those tiers. He has a number of papers on this, coauthored with his Ph.D. students, such as The Essence of Form Abstraction. One of the claims Form Abstraction via applicative functors makes is that if your programs are well-typed then your forms have proper encapsulation. At the same time, he has also developed the Blame Calculus with Jeremy Siek. Traditionally, one might view the problems BC solves strictly in terms of static and dynamic types. However, these are generally a continuum in any real-world system. When you have partition barriers, you have coordination that must be resolved. You can bake this coordination into the type system and create a static type system for distributed computing. You can also create a dynamic type system and provide a well-defined protocol for how types are dynamically sealed. Saying well-typed systems can't be blamed is really just saying that the type system is insulated from the outside world. Many real-world systems need to model situations where the types of entities can change dynamically, and engineers currently have two approaches for dealing with this problem:(a) Create an auditable document that says what is going to be changed and have everybody sign off on it, to keep track of what is changed. Modern "Fully Integrated Data Environments" can automate the generation of the audit document and handle correctness of the upgrade modulo some ad-hoc scripts, but they cost a lot of money since they're complicated systems that only commercial vendors would build. Changes still have to be made in the application code. For systems in the wild, CRMs and ERPs companies buy might require consultants of the CRM/ERP solution to come in and tell them if a change is even possible (and even then they are just guessing). You can track usage of schema elements via network profiling, but that doesn't explicitly tell you "this code is only called once a year". Some processes can run for over a year without a change.(b) Develop something like Ruby on Rails with lots of reflection and dynamic typing. This doesn't really address the issues raised by (a) directly since RoR is a not a robust dynamically reflective system. But the spirit of the idea is captured if we decouple the main ideas in RoR. For one, RoR could define schemas external to the application itself rather than directly reflect on classes. Classes could download the schema from an external source, and the applications knowledge of the schema would have to explicitly model the barrier between the application and the database, and the fact the database schema could change. This sort of dynamic dependency management is non-existent in real-world systems today, because we are stuck manipulating symbolic references. RoR has no way to tombstone itself and re-birth itself like a Phoenix.John "Z-Bo" Zabroski2010-10-062. Hewitt has spent most of his academic career advocating the Actor Model of computation. His argument was the Actor Model was a natural model of computation and was expressive enough to capture all computation. This therefore includes schematic evolution of a system. Actor systems have to be constructible that can survive problems like schema (type) mapping failures. Actor systems must also be able to solve problems like modeling problems with non-serializable subgoals. Hewitt's views on these problems are different from Wadler's.They are two sides of the same coin. One looks at the problem in terms of type systems, the other in terms of (concurrent, distributed) logic programming.W.r.t. Google: Google has released reports that half a second response latency drops their traffic by a fifth. If that half a second bottleneck is spent inside the search engine trying to find the "perfect" result set, then it is pointless from a business perspective. It is much better from the perspective of a statistician to create a model for AdWords independent of the model for latency/traffic and try to derive a trade-off function that can dynamically decide how good the results are vs. how much money inferior results might generate via AdWords clicks anyway.Corporations, on the other hand, have regulatory compliances they must adhere to for certain services. If they provide you with a draft of hospital charges for inpatient surgery, and it is approved by your insurance, they can't just randomly change that. The results of "Why did we charge this patient that?" have to be deterministic and point to a very specific, final answer.The principle trade-off for large enterprises is that they have many stovepipe applications that are dependent on one another, and enterprises lose money when a point of service or call center person has to look up why a customer's account is not showing the fact they paid a bill already. That 20 minute walk to the filing cabinet or whatever costs money, and it adds up. So decisions regarding CAP have to be driven by business requirements. Languages should allow people to reason about these trade-offs at a high-level.John "Z-Bo" Zabroski2010-10-06