Skip to content

Commit dbed33c

Browse files
author
Henry Robinson
committed
version1
1 parent 6b28b75 commit dbed33c

File tree

1 file changed

+176
-39
lines changed

1 file changed

+176
-39
lines changed

README.md

Lines changed: 176 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,45 @@
11
# The CAP FAQ
22

3+
Version 1.0, May 9th 2013
4+
By: Henry Robinson / henry.robinson@gmail.com / @henryr
5+
<pre><a href="http:://the-paper-trail.org/">http://the-paper-trail.org/</a></pre>
6+
7+
## 0. What is this document?
8+
9+
No subject appears to be more controversial to distributed systems
10+
engineers than the oft-quoted, oft-misunderstood CAP theorem. The
11+
purpose of this FAQ is to explain what is known about CAP, so as to
12+
help those new to the theorem get up to speed quickly, and to settle
13+
some common misconceptions or points of disagreement.
14+
15+
Of course, there's every possibility I've made superficial or
16+
completely thorough mistakes here. Corrections and comments are
17+
welcome: <a href="mailto:henry.robinson+cap@gmail.com">let me have
18+
them</a>.
19+
20+
There are some questions I still intend to answer. For example
21+
22+
* *What's the relationship between CAP and performance?*
23+
* *What does CAP mean to me as an engineer?*
24+
* *What's the relationship between CAP and ACID?*
25+
26+
Please suggest more.
27+
328
## 1. Where did the CAP Theorem come from?
429

530
Dr. Eric Brewer gave a keynote speech at the Principles of Distributed
6-
Computing conference in 2000 called 'Towards Robust Distributed Systems'.
31+
Computing conference in 2000 called 'Towards Robust Distributed
32+
Systems' [1]. In it he posed his 'CAP Theorem' - at the time unproven - which
33+
illustrated the tensions between being correct and being always
34+
available in distributed systems.
35+
36+
Two years later, Seth Gilbert and Professor Nancy Lynch - researchers
37+
in distributed systems at MIT - formalised and proved the conjecture
38+
in their paper “Brewer's conjecture and the feasibility of consistent,
39+
available, partition-tolerant web services” [2].
740

841
[1] http://www.cs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf
42+
[2] http://lpd.epfl.ch/sgilbert/pubs/BrewersConjecture-SigAct.pdf
943

1044
## 2. What does the CAP Theorem actually say?
1145

@@ -17,29 +51,46 @@ satisfies all of the following three properties:
1751
* Consistency - will all executions of reads and writes seen by all nodes be _atomic_ or _linearizably_ consistent?
1852
* Partition tolerance - the network is allowed to drop any messages.
1953

20-
The next few items define some of the terms.
54+
The next few items define any unfamiliar terms.
55+
56+
More informally, the CAP theorem tells us that we can't build a
57+
database that both responds to every request and returns the results
58+
that you would expect every time. It's an _impossibility_ result - it
59+
tells us that something we might want to do is actually provably out
60+
of reach. It's important now because it is directly applicable to the
61+
many, many distributed systems which have been and are being built in
62+
the last few years, but it is not a death knell: it does _not_ mean
63+
that we cannot build useful systems while working within these
64+
constraints.
65+
66+
The devil is in the details however. Before you start crying 'yes, but
67+
what about...', make sure you understand the following about exactly
68+
what the CAP theorem does and does not allow.
2169

2270
## 3. What is 'read-write storage'?
2371

24-
The CAP Theorem specifically concerns itself with a theoretical
72+
CAP specifically concerns itself with a theoretical
2573
construct called a _register_. A register is a data structure with two
2674
operations:
2775

2876
* set(X) sets the value of the register to X
29-
* get() returns the value in the register
77+
* get() returns the last value set in the register
78+
79+
A key-value store can be modelled as a collection of registers. Even
80+
though registers appear very simple, they capture the essence of what
81+
many distributed systems want to do - write data and read it back.
3082

31-
h2. 4. What does _atomic_ (or _linearizable_) mean?
83+
## 4. What does _atomic_ (or _linearizable_) mean?
3284

3385
Atomic, or linearizable, consistency is a guarantee about what values
34-
it's ok to return when a client performs get() operations.
86+
it's ok to return when a client performs get() operations. The idea is
87+
that the register appears to all clients as though it ran on just one
88+
machine, and responded to operations in the order they arrive.
3589

36-
The basic idea is this: consider an execution consisting the total set
37-
of operations performed by all clients, potentially concurrently. The
38-
results of those operations must, under atomic consistency, be
39-
equivalent to a single serial (in order, one after the other)
40-
execution of all operations. The idea is that the register appears as
41-
though it ran on just one machine, and responded to operations in the
42-
order they arrive.
90+
Consider an execution consisting the total set of operations performed
91+
by all clients, potentially concurrently. The results of those
92+
operations must, under atomic consistency, be equivalent to a single
93+
serial (in order, one after the other) execution of all operations.
4394

4495
This guarantee is very strong. It rules out, amongst other guarantees,
4596
_eventual consistency_, which allows a delay before a write becomes
@@ -49,7 +100,35 @@ visible. So under EC, you might have:
49100

50101
But this execution is invalid under atomic consistency.
51102

52-
For more information: [linearizability reference]
103+
Atomic consistency also ensures that external communication about the
104+
value of a register is respected. That is, if I read X and tell you
105+
about it, you can go to the register and read X for yourself. It's
106+
possible under slightly weaker guarantees (_serializability_ for
107+
example) for that not to be true. In the following we write A:<set or
108+
get> to mean that client A executes the following operation.
109+
110+
B:set(5), A:set(10), A:get() = 10, B:get() = 10
111+
112+
This is an atomic history. But the following is not:
113+
114+
B:set(5), A:set(10), A:get() = 10, B:get() = 5
115+
116+
even though it is equivalent to the following serial history:
117+
118+
B:set(5), B:get() = 5, A:set(10), B:get() = 10
119+
120+
In the second example, if A tells B about the value of the register
121+
(10) after it does its get(), B will falsely believe that some
122+
third-party has written 5 between A:get() and B:get(). If external
123+
communication isn't allowed, B cannot know about A:set, and so sees a
124+
consistent view of the register state; it's as if B:get really did
125+
happen before A:set.
126+
127+
Wikipedia [1] has more information. Maurice Herlihy's original paper
128+
from 1990 is available at [2].
129+
130+
[1] http://en.wikipedia.org/wiki/Linearizability
131+
[2] http://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
53132

54133
## 4. What does _asynchronous_ mean?
55134

@@ -86,36 +165,53 @@ messages are delivered between two sets of nodes. This is a more
86165
restrictive failure model. We'll call these kinds of partitions _total
87166
partitions_.
88167

89-
The proof of the CAP Theorem relied on a total partition. In practice,
168+
The proof of CAP relied on a total partition. In practice,
90169
these are arguably the most likely since all messages may flow through
91170
one component; if that fails then message loss is usually total
92171
between two nodes.
93172

94-
## When does a system have to give up C or A?
173+
## 7. Why is CAP true?
174+
175+
The basic idea is that if a client writes to one side of a partition,
176+
any reads that go to the other side of that partition can't possibly
177+
know about the most recent write. Now you're faced with a choice: do
178+
you respond to the reads with potentially stale information, or do you
179+
wait (potentially forever) to hear from the other side of the
180+
partition and compromise availability?
181+
182+
This is a proof by _construction_ - we demonstrate a single situation
183+
where a system cannot be consistent and available. One reason that CAP
184+
gets some press is that this constructed scenario is not completely
185+
unrealistic. It is not uncommon for a total partition to occur if
186+
networking equipment should fail.
95187

96-
The CAP Theorem only guarantees that there is _some_ circumstance in
97-
which a system must give up either C or A. Let's call that
98-
circumstance a _critical condition_. The theorem doesn't say anything
99-
about how likely that critical condition is. Both C and A are strong
100-
guarantees: they hold only if 100% of operations meet their
101-
requirements. A single inconsistent read, or unavailable write,
102-
invalidates either C or A.
188+
## 8. When does a system have to give up C or A?
189+
190+
CAP only guarantees that there is _some_ circumstance in which a
191+
system must give up either C or A. Let's call that circumstance a
192+
_critical condition_. The theorem doesn't say anything about how
193+
likely that critical condition is. Both C and A are strong guarantees:
194+
they hold only if 100% of operations meet their requirements. A single
195+
inconsistent read, or unavailable write, invalidates either C or
196+
A. But until that critical condition is met, a system can be happily
197+
consistent _and_ available and not contravene any known laws.
103198

104199
Since most distributed systems are long running, and may see millions
105-
of requests in their lifetime, the CAP Theorem tells us to be
200+
of requests in their lifetime, CAP tells us to be
106201
cautious: there's a good chance that you'll realistically hit one of
107202
these critical conditions, and it's prudent to understand how your
108203
system will fail to meet either C or A.
109204

110-
##. Why do some people get annoyed when I characterise my system as CA?
205+
## 9. Why do some people get annoyed when I characterise my system as CA?
111206

112207
Brewer's keynote, the Gilbert paper, and many other treatments, places
113208
C, A and P on an equal footing as desirable properties of an
114-
implementation. However, this is often considered to be a misleading
115-
presentation, since you cannot build 'partition tolerance'; your
116-
system either might experience partitions or it won't.
209+
implementation and effectively say 'choose two!'. However, this is
210+
often considered to be a misleading presentation, since you cannot
211+
build - or choose! - 'partition tolerance': your system either might experience
212+
partitions or it won't.
117213

118-
The CAP Theorem is better understood as describing the tradeoffs you
214+
CAP is better understood as describing the tradeoffs you
119215
have to make when you are building a system that may suffer
120216
partitions. In practice, this is every distributed system: there is no
121217
100% reliable network. So (at least in the distributed context) there
@@ -125,24 +221,29 @@ therefore you must at some point compromise C or A.
125221
Therefore it's arguably more instructive to rewrite the theorem as the
126222
following:
127223

128-
Possibility of Partitions => Not (C and A)
224+
<pre>Possibility of Partitions => Not (C and A)</pre>
129225

130226
i.e. if your system may experience partitions, you can not always be C
131227
and A.
132228

133229
There are some systems that won't experience partitions - single-site
134230
databases, for example. These systems aren't generally relevant to the
135-
contexts in which CAP is most useful.
231+
contexts in which CAP is most useful. If you describe your distributed
232+
database as 'CA', you are misunderstanding something.
136233

137-
## 8. What about when messages don't get lost?
234+
## 10. What about when messages don't get lost?
138235

139236
A perhaps surprising result from the Gilbert paper is that no
140237
implementation of an atomic register in an asynchronous network can be
141-
available at all times, and consistent even when no messages are lost.
238+
available at all times, and consistent only when no messages are lost.
142239

143-
This result depends upon the asynchronous network property.
240+
This result depends upon the asynchronous network property, the idea
241+
being that it is impossible to tell if a message has been dropped and
242+
therefore a node cannot wait indefinitely for a response while still
243+
maintaining availability, however if it responds too early it might be
244+
inconsistent.
144245

145-
## 9. Is my network really asynchronous?
246+
## 11. Is my network really asynchronous?
146247

147248
Arguably, yes. Different networks have vastly differing characteristics.
148249

@@ -153,7 +254,24 @@ If
153254

154255
then your network may be considered _asynchronous_.
155256

156-
## 10. What, if any, is the relationship between FLP and CAP?
257+
Gilbert and Lynch also proved that in a _partially-synchronous_
258+
system, where nodes have shared but not synchronised clocks and there
259+
is a bound on the processing time of every message, that it is still
260+
impossible to implement available atomic storage.
261+
262+
However, the result from #8 does _not_ hold in the
263+
partially-synchronous model; it is possible to implement atomic
264+
storage that is available all the time, and consistent when all
265+
messages are delivered.
266+
267+
## 12. What, if any, is the relationship between FLP and CAP?
268+
269+
The Fischer, Lynch and Patterson theorem ('FLP') (see [1] for a link
270+
to the paper and a proof explanation) is an extraordinary
271+
impossibility result from nearly thirty years ago, which determined
272+
that the problem of consensus - having all nodes agree on a common
273+
value - is unsolvable in general in asynchronous networks where one
274+
node might fail.
157275

158276
The FLP result is not directly related to CAP, although they are
159277
similar in some respects. Both are impossibility results about
@@ -169,11 +287,16 @@ from CAP:
169287
* FLP deals with _consensus_, which is a similar but different problem
170288
to _atomic storage_.
171289

172-
## 11. Are C and A 'spectrums'?
290+
For a bit more on this topic, consult the blog post at [2].
291+
292+
[1] http://the-paper-trail.org/blog/a-brief-tour-of-flp-impossibility/
293+
[2] http://the-paper-trail.org/blog/flp-and-cap-arent-the-same-thing/
294+
295+
## 13. Are C and A 'spectrums'?
173296

174297
It is possible to relax both consistency and availability guarantees
175298
from the strong requirements that CAP imposes and get useful
176-
systems. In fact, the whole point of the CAP Theorem is that you
299+
systems. In fact, the whole point of CAP is that you
177300
_must_ do this, and any system you have designed and built relaxes one
178301
or both of these guarantees. The onus is on you to figure out when,
179302
and how, this occurs.
@@ -183,6 +306,9 @@ whom consistency is of the utmost importance, like ZooKeeper. Other
183306
systems, like Amazon's Dynamo, relax consistency in order to maintain
184307
high degrees of availability.
185308

309+
Once you weaken any of the assumptions made in the statement or proof
310+
of CAP, you have to start again when it comes to proving an
311+
impossibility result.
186312

187313
## 14. Is a failed machine the same as a partitioned one?
188314

@@ -193,7 +319,18 @@ without having any machines fail).
193319

194320
It is possible to prove a similar result about the impossibility of
195321
atomic storage in an asynchronous network when there are up to N-1
196-
failures
322+
failures. This result has ramifications about the tradeoff between how
323+
many nodes you write to (which is a performance concern) and how fault
324+
tolerant you are (which is a reliability concern).
325+
326+
## 15. Is a slow machine the same as a partitioned one?
327+
328+
No: messages eventually get delivered to a slow machine, but they
329+
never get delivered to a totally partitioned one. However, slow
330+
machines play a significant role in making it very hard to distinguish
331+
between lost messages (or failed machines) and a slow machine. This
332+
difficulty is right at the heart of why CAP, FLP and other results are
333+
true.
197334

198335
## 16. Have I 'got around' or 'beaten' the CAP theorem?
199336

0 commit comments

Comments
 (0)