version1

Henry Robinson · Henry Robinson · commit dbed33cc05cf · 2013-05-09T19:05:36.000-07:00
diff --git a/README.md b/README.md
@@ -1,11 +1,45 @@
 # The CAP FAQ
 
+    Version 1.0, May 9th 2013
+    By: Henry Robinson / henry.robinson@gmail.com / @henryr
+<pre><a href="http:://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>
+
+## 0. What is this document?
+
+No subject appears to be more controversial to distributed systems
+engineers than the oft-quoted, oft-misunderstood CAP theorem. The
+purpose of this FAQ is to explain what is known about CAP, so as to
+help those new to the theorem get up to speed quickly, and to settle
+some common misconceptions or points of disagreement.
+
+Of course, there's every possibility I've made superficial or
+completely thorough mistakes here. Corrections and comments are
+welcome: <a href="mailto:henry.robinson+cap@gmail.com">let me have
+them</a>.
+
+There are some questions I still intend to answer. For example
+
+* *What's the relationship between CAP and performance?*
+* *What does CAP mean to me as an engineer?*
+* *What's the relationship between CAP and ACID?*
+
+Please suggest more.
+
 ## 1. Where did the CAP Theorem come from?
 
 Dr. Eric Brewer gave a keynote speech at the Principles of Distributed
-Computing conference in 2000 called 'Towards Robust Distributed Systems'.
+Computing conference in 2000 called 'Towards Robust Distributed
+Systems' [1]. In it he posed his 'CAP Theorem' - at the time unproven - which
+illustrated the tensions between being correct and being always
+available in distributed systems.
+
+Two years later, Seth Gilbert and Professor Nancy Lynch - researchers
+in distributed systems at MIT - formalised and proved the conjecture
+in their paper “Brewer's conjecture and the feasibility of consistent,
+available, partition-tolerant web services” [2].
 
 [1] http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
+[2] http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
 
 ## 2. What does the CAP Theorem actually say?
 
@@ -17,29 +51,46 @@ satisfies all of the following three properties:
 * Consistency - will all executions of reads and writes seen by all nodes be _atomic_ or _linearizably_ consistent?
 * Partition tolerance - the network is allowed to drop any messages.
 
-The next few items define some of the terms.
+The next few items define any unfamiliar terms.
+
+More informally, the CAP theorem tells us that we can't build a
+database that both responds to every request and returns the results
+that you would expect every time. It's an _impossibility_ result - it
+tells us that something we might want to do is actually provably out
+of reach. It's important now because it is directly applicable to the
+many, many distributed systems which have been and are being built in
+the last few years, but it is not a death knell: it does _not_ mean
+that we cannot build useful systems while working within these
+constraints.
+
+The devil is in the details however. Before you start crying 'yes, but
+what about...', make sure you understand the following about exactly
+what the CAP theorem does and does not allow.
 
 ## 3. What is 'read-write storage'?
 
-The CAP Theorem specifically concerns itself with a theoretical
+CAP specifically concerns itself with a theoretical
 construct called a _register_. A register is a data structure with two
 operations:
 
 * set(X) sets the value of the register to X
-* get() returns the value in the register
+* get() returns the last value set in the register
+
+A key-value store can be modelled as a collection of registers. Even
+though registers appear very simple, they capture the essence of what
+many distributed systems want to do - write data and read it back.
 
-h2. 4. What does _atomic_ (or _linearizable_) mean?
+## 4. What does _atomic_ (or _linearizable_) mean?
 
 Atomic, or linearizable, consistency is a guarantee about what values
-it's ok to return when a client performs get() operations.
+it's ok to return when a client performs get() operations. The idea is
+that the register appears to all clients as though it ran on just one
+machine, and responded to operations in the order they arrive.
 
-The basic idea is this: consider an execution consisting the total set
-of operations performed by all clients, potentially concurrently. The
-results of those operations must, under atomic consistency, be
-equivalent to a single serial (in order, one after the other)
-execution of all operations. The idea is that the register appears as
-though it ran on just one machine, and responded to operations in the
-order they arrive.
+Consider an execution consisting the total set of operations performed
+by all clients, potentially concurrently. The results of those
+operations must, under atomic consistency, be equivalent to a single
+serial (in order, one after the other) execution of all operations.
 
 This guarantee is very strong. It rules out, amongst other guarantees,
 _eventual consistency_, which allows a delay before a write becomes
@@ -49,7 +100,35 @@ visible. So under EC, you might have:
 
 But this execution is invalid under atomic consistency.
 
-For more information: [linearizability reference]
+Atomic consistency also ensures that external communication about the
+value of a register is respected. That is, if I read X and tell you
+about it, you can go to the register and read X for yourself. It's
+possible under slightly weaker guarantees (_serializability_ for
+example) for that not to be true. In the following we write A:<set or
+get> to mean that client A executes the following operation.
+
+    B:set(5), A:set(10), A:get() = 10, B:get() = 10
+
+This is an atomic history. But the following is not:
+
+    B:set(5), A:set(10), A:get() = 10, B:get() = 5
+
+even though it is equivalent to the following serial history:
+
+    B:set(5), B:get() = 5, A:set(10), B:get() = 10
+
+In the second example, if A tells B about the value of the register
+(10) after it does its get(), B will falsely believe that some
+third-party has written 5 between A:get() and B:get(). If external
+communication isn't allowed, B cannot know about A:set, and so sees a
+consistent view of the register state; it's as if B:get really did
+happen before A:set.
+
+Wikipedia [1] has more information. Maurice Herlihy's original paper
+from 1990 is available at [2].
+
+[1] http://en.wikipedia.org/wiki/Linearizability
+[2] http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
 
 ## 4. What does _asynchronous_ mean?
 
@@ -86,36 +165,53 @@ messages are delivered between two sets of nodes. This is a more
 restrictive failure model. We'll call these kinds of partitions _total
 partitions_.
 
-The proof of the CAP Theorem relied on a total partition. In practice,
+The proof of CAP relied on a total partition. In practice,
 these are arguably the most likely since all messages may flow through
 one component; if that fails then message loss is usually total
 between two nodes.
 
-## When does a system have to give up C or A?
+## 7. Why is CAP true?
+
+The basic idea is that if a client writes to one side of a partition,
+any reads that go to the other side of that partition can't possibly
+know about the most recent write. Now you're faced with a choice: do
+you respond to the reads with potentially stale information, or do you
+wait (potentially forever) to hear from the other side of the
+partition and compromise availability?
+
+This is a proof by _construction_ - we demonstrate a single situation
+where a system cannot be consistent and available. One reason that CAP
+gets some press is that this constructed scenario is not completely
+unrealistic. It is not uncommon for a total partition to occur if
+networking equipment should fail.
 
-The CAP Theorem only guarantees that there is _some_ circumstance in
-which a system must give up either C or A. Let's call that
-circumstance a _critical condition_.  The theorem doesn't say anything
-about how likely that critical condition is. Both C and A are strong
-guarantees: they hold only if 100% of operations meet their
-requirements. A single inconsistent read, or unavailable write,
-invalidates either C or A.
+## 8. When does a system have to give up C or A?
+
+CAP only guarantees that there is _some_ circumstance in which a
+system must give up either C or A. Let's call that circumstance a
+_critical condition_.  The theorem doesn't say anything about how
+likely that critical condition is. Both C and A are strong guarantees:
+they hold only if 100% of operations meet their requirements. A single
+inconsistent read, or unavailable write, invalidates either C or
+A. But until that critical condition is met, a system can be happily
+consistent _and_ available and not contravene any known laws.
 
 Since most distributed systems are long running, and may see millions
-of requests in their lifetime, the CAP Theorem tells us to be
+of requests in their lifetime, CAP tells us to be
 cautious: there's a good chance that you'll realistically hit one of
 these critical conditions, and it's prudent to understand how your
 system will fail to meet either C or A.
 
-##. Why do some people get annoyed when I characterise my system as CA?
+## 9. Why do some people get annoyed when I characterise my system as CA?
 
 Brewer's keynote, the Gilbert paper, and many other treatments, places
 C, A and P on an equal footing as desirable properties of an
-implementation. However, this is often considered to be a misleading
-presentation, since you cannot build 'partition tolerance'; your
-system either might experience partitions or it won't.
+implementation and effectively say 'choose two!'. However, this is
+often considered to be a misleading presentation, since you cannot
+build - or choose! - 'partition tolerance': your system either might experience
+partitions or it won't.
 
-The CAP Theorem is better understood as describing the tradeoffs you
+CAP is better understood as describing the tradeoffs you
 have to make when you are building a system that may suffer
 partitions. In practice, this is every distributed system: there is no
 100% reliable network. So (at least in the distributed context) there
@@ -125,24 +221,29 @@ therefore you must at some point compromise C or A.
 Therefore it's arguably more instructive to rewrite the theorem as the
 following:
 
-Possibility of Partitions => Not (C and A)
+<pre>Possibility of Partitions => Not (C and A)</pre>
 
 i.e. if your system may experience partitions, you can not always be C
 and A.
 
 There are some systems that won't experience partitions - single-site
 databases, for example. These systems aren't generally relevant to the
-contexts in which CAP is most useful.
+contexts in which CAP is most useful. If you describe your distributed
+database as 'CA', you are misunderstanding something.
 
-## 8. What about when messages don't get lost?
+## 10. What about when messages don't get lost?
 
 A perhaps surprising result from the Gilbert paper is that no
 implementation of an atomic register in an asynchronous network can be
-available at all times, and consistent even when no messages are lost.
+available at all times, and consistent only when no messages are lost.
 
-This result depends upon the asynchronous network property.
+This result depends upon the asynchronous network property, the idea
+being that it is impossible to tell if a message has been dropped and
+therefore a node cannot wait indefinitely for a response while still
+maintaining availability, however if it responds too early it might be
+inconsistent.
 
-## 9. Is my network really asynchronous?
+## 11. Is my network really asynchronous?
 
 Arguably, yes. Different networks have vastly differing characteristics.
 
@@ -153,7 +254,24 @@ If
 
 then your network may be considered _asynchronous_.
 
-## 10. What, if any, is the relationship between FLP and CAP?
+Gilbert and Lynch also proved that in a _partially-synchronous_
+system, where nodes have shared but not synchronised clocks and there
+is a bound on the processing time of every message, that it is still
+impossible to implement available atomic storage.
+
+However, the result from #8 does _not_ hold in the
+partially-synchronous model; it is possible to implement atomic
+storage that is available all the time, and consistent when all
+messages are delivered.
+
+## 12. What, if any, is the relationship between FLP and CAP?
+
+The Fischer, Lynch and Patterson theorem ('FLP') (see [1] for a link
+to the paper and a proof explanation) is an extraordinary
+impossibility result from nearly thirty years ago, which determined
+that the problem of consensus - having all nodes agree on a common
+value - is unsolvable in general in asynchronous networks where one
+node might fail.
 
 The FLP result is not directly related to CAP, although they are
 similar in some respects. Both are impossibility results about
@@ -169,11 +287,16 @@ from CAP:
 * FLP deals with _consensus_, which is a similar but different problem
   to _atomic storage_.
 
-## 11. Are C and A 'spectrums'?
+For a bit more on this topic, consult the blog post at [2].
+
+[1] http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/
+[2] http://the-paper-trail.org/blog/flp-and-cap-arent-the-same-thing/
+
+## 13. Are C and A 'spectrums'?
 
 It is possible to relax both consistency and availability guarantees
 from the strong requirements that CAP imposes and get useful
-systems. In fact, the whole point of the CAP Theorem is that you
+systems. In fact, the whole point of CAP is that you
 _must_ do this, and any system you have designed and built relaxes one
 or both of these guarantees. The onus is on you to figure out when,
 and how, this occurs.
@@ -183,6 +306,9 @@ whom consistency is of the utmost importance, like ZooKeeper. Other
 systems, like Amazon's Dynamo, relax consistency in order to maintain
 high degrees of availability.
 
+Once you weaken any of the assumptions made in the statement or proof
+of CAP, you have to start again when it comes to proving an
+impossibility result.
 
 ## 14. Is a failed machine the same as a partitioned one?
 
@@ -193,7 +319,18 @@ without having any machines fail).
 
 It is possible to prove a similar result about the impossibility of
 atomic storage in an asynchronous network when there are up to N-1
-failures
+failures. This result has ramifications about the tradeoff between how
+many nodes you write to (which is a performance concern) and how fault
+tolerant you are (which is a reliability concern).
+
+## 15. Is a slow machine the same as a partitioned one?
+
+No: messages eventually get delivered to a slow machine, but they
+never get delivered to a totally partitioned one. However, slow
+machines play a significant role in making it very hard to distinguish
+between lost messages (or failed machines) and a slow machine. This
+difficulty is right at the heart of why CAP, FLP and other results are
+true.
 
 ## 16. Have I 'got around' or 'beaten' the CAP theorem?