Skip to content

Commit ecd0c05

Browse files
spec: add Consensus English spec (#89)
* Move cometbft/knowledge-base#18 here. * Add diagram, create the round state machine spec * Update Specs/English/messsages-to-events.md Signed-off-by: Josef Widder <josef@informal.systems> * added abstract and combined into one file (#122) * Show the async getValue and validity changes --------- Signed-off-by: Josef Widder <josef@informal.systems> Co-authored-by: Josef Widder <44643235+josef-widder@users.noreply.github.com>
1 parent 32498ca commit ecd0c05

File tree

2 files changed

+378
-0
lines changed

2 files changed

+378
-0
lines changed

Specs/English/README.md

Lines changed: 378 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,378 @@
1+
# Malachite Documentation
2+
3+
Malachite is an implementation of the Tendermint consensus algorithm in Rust.
4+
It comes together with an executable specification in Quint. We will use
5+
model-based testing to make sure that the implementation corresponds to the
6+
specification.
7+
8+
Tendermint consensus works by a set of validator nodes exchanging messages over a
9+
network, and the local consensus instances act on the incoming messages if
10+
certain conditions are met (e.g., if a threshold number of specific messages is
11+
received a state transition should happen). The
12+
architecture of Malachite separates
13+
14+
- message counting in a *vote book keeper*,
15+
- creating consensus inputs in a *driver*, e.g., if a threshold is reached
16+
- doing the state transition depending on the consensus input in the *state machine*
17+
18+
A detailed executable specification of these functionalities are given in Quint.
19+
In this (English) document we discuss some underlying principles, namely,
20+
21+
- [Message handling](#messages-to-events): How to treat incoming messages. Which messages to store,
22+
and on what conditions to generate consensus inputs.
23+
24+
- [Round state machine](#consensus-protocol---round-state-machine): How to change state depending on the
25+
current state and a consensus input.
26+
27+
28+
29+
## Messages to Events
30+
31+
The consensus state-machine operates on complex `Event`s that reflect the
32+
reception of one or multiple `Message`s, combined with state elements and the
33+
interaction with other modules.
34+
35+
This document overviews how messages should be handled at different stages of
36+
the protocol.
37+
38+
It is assume that a process is at round `r` of height `h` of consensus, or in
39+
short, at round `(h, r)`.
40+
41+
42+
### Different heights
43+
44+
Messages with heights `h'` with either `h' < h` (past) or `h' > h` (future).
45+
46+
The pseudocode description of the algorithm ignores messages from different
47+
heights.
48+
If we take the same approach in this specification, we have to specify
49+
separately modules responsible to handle those messages.
50+
51+
52+
- Past heights (`h' < h`): the consensus state machine is not affected by such
53+
messages. However, their reception might indicate that a peer is lagging
54+
behind in the protocol, and need to be synchronized.
55+
- In CometBFT's implementation we handle message from the previous height
56+
(`h' = h - 1`) for the `LastCommit` vote set. This only happens during the
57+
first step of the first round (`r = 0`) of a height.
58+
- Future heights (`h' > h`): the consensus state machine is not able to process
59+
message from future heights in a proper way, as the validator set for them is
60+
not known. However, once the process reaches this height `h'`, those messages
61+
are _required_ for proper operation. There are two options here:
62+
1. Buffer a limited amount of such messages
63+
2. Assume that the communication subsystem (p2p) is able to retrieve (ask for
64+
retransmission) of them when the process reaches height `h'`.
65+
Notice that this option implies that processes keep a minimal set of
66+
consensus messages that enables peers lagging behind to decide a past height.
67+
68+
### Previous rounds
69+
70+
Messages from rounds `(h, r')` with `r' < r`: same height `h` but previous round `r'`.
71+
72+
The consensus state machine requires receiving and processing messages from
73+
previous rounds:
74+
75+
- `PREVOTE` messages can produce a Proof of Lock (POL) `2f + 1 ⟨PREVOTE, h, vr, id(v)⟩`
76+
needed for accepting `PROPOSAL(h, r, v, vr)` message from the current round,
77+
where `vr == r' < r` (L28).
78+
- `PRECOMMIT` messages can produce a Precommit quorum `2f + 1 ⟨PRECOMMIT, h, r', id(v)⟩`
79+
that leads to the decision of `v` at round `r'` (L49).
80+
- `PROPOSAL` messages can be required to match a produced Precommit quorum (L49).
81+
- Associated full value messages are required to produce the `⟨PROPOSAL, h, r', v, *⟩` event
82+
83+
The production of the enumerated events from previous rounds should be
84+
identical to the production of events from messages from the [current round](#current-round).
85+
86+
### Future rounds
87+
88+
Messages from rounds `(h, r')` with `r' > r`: same height `h` but future round `r'`.
89+
90+
#### Round skipping
91+
92+
The consensus state machine requires receiving and processing messages from
93+
future rounds for enabling the _round skipping_ mechanism, defined as follows
94+
in the pseudocode:
95+
96+
```
97+
55: upon f + 1 ⟨∗, hp, round, ∗, ∗⟩ with round > roundp do
98+
56: StartRound(round)
99+
```
100+
101+
The current interpretation of this rule is that messages from a round `r' > r`
102+
are received from `f + 1` voting-power equivalent distinct senders.
103+
This means, that at least `1` correct process is at round `r'`.
104+
105+
While this threshold does not need to be adopted (it can be configurable),
106+
messages from a future round should initially have their unique senders counted.
107+
Once the round skip threshold of processes is reached, the corresponding event
108+
should be produced.
109+
110+
#### Limits
111+
112+
The same reasoning applied for messages from [future heights](#different-heights)
113+
applies for messages from future rounds.
114+
115+
Messages from future rounds are _required_ for the proper operation of the
116+
consensus state machine once the process reaches their round `r'`.
117+
There are two options, which can in particular be combined:
118+
119+
1. Buffer a limited amount of such messages, or messages from a limited amount
120+
of future rounds `r'`
121+
- In CometBFT's implementation, only messages from round `r' = r + 1` are tracked.
122+
2. Assume that the communication subsystem (p2p) is able to retrieve (ask for
123+
retransmission) of messages from future rounds when the process reaches round `r'`.
124+
Since messages from [previous rounds](#previous-rounds) are stored by
125+
default, peers that have reached the future round `r'` should be able to
126+
retransmit them.
127+
128+
### Current round
129+
130+
Messages matching the current round `(h, r)` of a process produce most of the
131+
relevant events for the consensus state machine.
132+
133+
### Counting votes
134+
135+
Messages `⟨PREVOTE, h, r, *⟩` and `⟨PRECOMMIT, h, r, *⟩` are generically called votes.
136+
They refer to a round step `(h, r, s)` of consensus, where `s` is defined by
137+
the vote type, either `PREVOTE` or `PRECOMMIT`.
138+
139+
The processing of _individual_ vote messages doesn't produce events relevant for
140+
the consensus state machine.
141+
But when the number of unique vote messages referring to a given round step
142+
`(h, r, s)` reaches a given _threshold_, relevant events are produced;
143+
the produced event depends on the value carried by such votes.
144+
145+
General assumptions regarding vote messages:
146+
147+
- Vote messages are produced, signed and broadcast by a validator, which is its
148+
*sender*
149+
- To define whether a vote message for round step `(h, r, s)` is valid, the
150+
validator set for height `h` must be known.
151+
The validator set can change over heights, but it is the same within a height.
152+
- To each validator in the validator set of a height `h` is associated a *voting power*
153+
- Thresholds are computed from the voting power associated to the
154+
sender of each vote message
155+
- A vote message carries a value: either a reference to a proposed value
156+
`id(v)`, or the special `nil` value
157+
- For practical effects, it should be considered that the size of vote
158+
messages is constant
159+
- Correct validators produce at most one vote message per round step: either
160+
for a `id(v)` or for `nil`
161+
- Byzantine validators may equivocate and produce multiple distinct vote
162+
messages for the same round step. Equivocating vote messages differ on the
163+
value they carry: for `nil`, `id(v)`, `id(v')`, etc.
164+
- This possibility constitutes an attack vector. A process must thus restrict
165+
the number of distinct vote messages from the same sender and referring to
166+
the same round step that can be stored.
167+
168+
#### `f + 1` threshold
169+
170+
This threshold represents that vote messages referring to a round step were
171+
received from a enough number of unique senders, so that it is guaranteed that
172+
_at least one_ of the senders is a _correct_ validator.
173+
174+
The rationale here is that the cumulative voting power of Byzantine validators
175+
cannot exceed `f`, so that at least one of the considered vote messages must
176+
have been produced by a correct validator.
177+
178+
#### `2f + 1` threshold
179+
180+
This threshold represents that vote messages referring to a round step were
181+
received from a enough number of unique senders, so that it is guaranteed that
182+
_the majority_ of the senders are _correct_ validators.
183+
184+
The rationale here is that the cumulative voting power of Byzantine validators
185+
cannot exceed `f`, so that the subset of considered vote messages that must
186+
have been produced by correct validators have a cumulative voting power of at
187+
least `f + 1`, which is strictly greater than `f`.
188+
189+
## Consensus protocol - round state machine
190+
191+
This document provides an overview of the Tendermint consensus protocol and follows ["The latest gossip on BFT consensus"](#References) and the English and Quint specifications located in the [Specs](../../Specs) directory.
192+
193+
The consensus state-machine operates on complex `Event`s that reflect the
194+
reception of one or multiple `Message`s, combined with state elements and the
195+
interaction with other modules.
196+
197+
### Round state-machine
198+
199+
The state machine represents the operation of consensus at a single `Height(h)` and `Round(r)`.
200+
The diagram below offers a visual representation of the state machine. It shows the input events, using green for simple inputs (e.g. timeouts, porposal)
201+
and red for the complex events (e.g. `ProposalAndPolkaCurrent` is sent to the state machine when a valid proposal and a polka of prevotes have been received).
202+
The actions are shown in italics (blue) and the output messages are shown in blue.
203+
204+
![Consensus SM Diagram](assets/sm_diagram.jpeg)
205+
206+
The set of states can be summarized as:
207+
208+
- `NewRound`
209+
- Initial state
210+
- Can be used to store messages early received for this round
211+
- In the algorithm when `roundp < r`, where `roundp` is the node's current round
212+
- InProgress (`Propose`, `Prevote`, `Precommit`)
213+
- Actual consensus single-round execution
214+
- In the algorithm when `roundp == r`
215+
- `Commit`
216+
- Final state for a successful round
217+
218+
#### Exit transitions
219+
The table below summarizes the major state transitions in the `Round(r)` state machine.
220+
The transactions from state `InProgress` consider that node can be at any of
221+
the `Propose`, `Prevote`, `Precommit` states.
222+
The `Ref` column refers to the line of the pseudocode where the events can be found.
223+
224+
| From | To | Ev Name | Event Details | Action | Ref |
225+
| ---------- |------------|------------------------------|-------------------------------------------------------------------|-----------------------------------| --- |
226+
| InProgress | InProgress | PrecommitAny | `2f + 1 ⟨PRECOMMIT, h, r, *⟩` <br> for the first time | schedule `TimeoutPrecommit(h, r)` | L47 |
227+
| InProgress | NewRound | TimeoutPrecommit | `TimeoutPrecommit(h, r)` | `next_round(r+1)` | L65 |
228+
| InProgress | NewRound | SkipRound(r') | `f + 1 ⟨*, h, r', *, *⟩` with `r' > r` | `next_round(r')` | L55 |
229+
| InProgress | Commit | ProposalAndPrecommitValue(v) | `⟨PROPOSAL, h, r', v, *⟩` <br> `2f + 1 ⟨PRECOMMIT, h, r', id(v)⟩` | `commit(v)` | L49 |
230+
231+
#### InProgress round
232+
233+
The table below summarizes the state transitions within the `InProgress` state
234+
of the `Round(r)` state machine.
235+
The following state transitions represent the core of the consensus algorithm.
236+
The `Ref` column refers to the line of the pseudocode where the events can be found.
237+
238+
| From | To | Event | Details | Actions and Return | Ref |
239+
|-----------|-----------|----------------------------------------|----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|-----|
240+
| NewRound | Propose | NewRound(proposer) | `StartRound` with `proposer(h, r) = p` | **async `getValue()` and schedule `TimeoutPropose(h, r)`** | L19 |
241+
| NewRound | Propose | NewRound(non-proposer) | `StartRound` with `proposer(h, r) != p` (optional restriction) | schedule `TimeoutPropose(h, r)` | L21 |
242+
| **Propose** | **Propose** | **ProposeValue(v)** | `getValue()` returned | broadcast `⟨PROPOSAL, h, r, v, validRound⟩` | L19 |
243+
| Propose | Prevote | Proposal(v, -1) | `⟨PROPOSAL, h, r, v, −1⟩` | broadcast `⟨PREVOTE, h, r, {id(v), nil}⟩` | L23 |
244+
| Propose | Prevote | **InvalidProposal**(v, -1) | `⟨PROPOSAL, h, r, v, −1⟩` | broadcast `⟨PREVOTE, h, r, nil⟩` | L32 |
245+
| Propose | Prevote | ProposalAndPolkaPrevious(v, vr) | `⟨PROPOSAL, h, r, v, vr⟩` <br> `2f + 1 ⟨PREVOTE, h, vr, id(v)⟩` with `vr < r` | broadcast `⟨PREVOTE, h, r, {id(v), nil}⟩` | L30 |
246+
| Propose | Prevote | **InvalidProposalAndPolkaPrevious**(v, vr) | `⟨PROPOSAL, h, r, v, vr⟩` <br> `2f + 1 ⟨PREVOTE, h, vr, id(v)⟩` with `vr < r` | broadcast `⟨PREVOTE, h, r, nil⟩` | L32 |
247+
| Propose | Prevote | TimeoutPropose | `TimeoutPropose(h, r)` | broadcast `⟨PREVOTE, h, r, nil⟩` | L57 |
248+
| Prevote | Prevote | PolkaAny | `2f + 1 ⟨PREVOTE, h, r, *⟩` <br> for the first time | schedule `TimeoutPrevote(h, r)⟩` | L34 |
249+
| Prevote | Precommit | ProposalAndPolkaCurrent(v) | `⟨PROPOSAL, h, r, v, ∗⟩` <br> `2f + 1 ⟨PREVOTE, h, r, id(v)⟩` <br> for the first time | update `lockedValue, lockedRound, validValue, validRound`,<br /> broadcast `⟨PRECOMMIT, h, r, id(v)⟩` | L36 |
250+
| Prevote | Precommit | PolkaNil | `2f + 1 ⟨PREVOTE, h, r, nil⟩` | broadcast `⟨PRECOMMIT, h, r, nil⟩` | L44 |
251+
| Prevote | Precommit | TimeoutPrevote | `TimeoutPrevote(h, r)` | broadcast `⟨PRECOMMIT, h, r, nil⟩` | L61 |
252+
| Precommit | Precommit | PolkaValue(v) | `⟨PROPOSAL, h, r, v, ∗⟩` <br> `2f + 1 ⟨PREVOTE, h, r, id(v)⟩` <br> for the first time | update `validValue, validRound` | L36 |
253+
254+
The ordinary operation of a round of consensus consists on the sequence of
255+
round steps `Propose`, `Prevote`, and `Precommit`, represented in the table.
256+
The conditions for concluding a round of consensus, therefore for leaving the
257+
`InProgress` state, are presented in the previous subsection.
258+
259+
##### Validity Checks
260+
The pseudocode of the algorithm includes validity checks for the messages. These checks have been moved out of the state machine and are now performed by the `driver` module.
261+
For this reason:
262+
- `L22` is covered by `Proposal(v, -1) and `InvalidProposal(v, -1)`
263+
- `L28` is covered by `ProposalAndPolkaPrevious(v, vr)` and `InvalidProposalAndPolkaPrevious(v, vr)`
264+
- `L36` and `L49` are only called with valid proposal
265+
266+
TODO - show the full algorithm with all the changes
267+
268+
##### Asynchronous getValue() and ProposeValue(v)
269+
The original algorithm is modified to allow for asynchronous `getValue()`. The details are described below.
270+
271+
<table>
272+
<tr>
273+
<th>arXiv paper</th>
274+
<th>Async getValue()</th>
275+
</tr>
276+
277+
<tr >
278+
<td>
279+
280+
```
281+
function StartRound(round) {
282+
round_p ← round
283+
step_p ← propose
284+
if proposer(h_p, round_p) = p {
285+
if validValue_p != nil {
286+
proposal ← validValue_p
287+
288+
289+
290+
} else {
291+
proposal ← getValue()
292+
293+
}
294+
295+
296+
broadcast ⟨PROPOSAL, h_p, round_p, proposal, validRound_p⟩
297+
} else {
298+
schedule OnTimeoutPropose(h_p,round_p) to
299+
be executed after timeoutPropose(round_p)
300+
}
301+
}
302+
```
303+
304+
</td>
305+
306+
<td>
307+
308+
```
309+
function StartRound(round) {
310+
round_p ← round
311+
step_p ← propose
312+
if proposer(h_p, round_p) = p {
313+
if validValue_p != nil {
314+
proposal ← validValue_p
315+
316+
broadcast ⟨PROPOSAL, h_p, round_p, proposal, validRound_p⟩
317+
318+
} else {
319+
getValue() // async
320+
schedule OnTimeoutPropose(h_p,round_p) to
321+
be executed after timeoutPropose(round_p)
322+
}
323+
324+
325+
} else {
326+
schedule OnTimeoutPropose(h_p,round_p) to
327+
be executed after timeoutPropose(round_p)
328+
}
329+
}
330+
```
331+
332+
</td>
333+
</tr>
334+
</table>
335+
336+
- New Rule added
337+
338+
<table>
339+
<tr>
340+
<th>arXiv paper</th>
341+
<th>Async getValue()</th>
342+
</tr>
343+
344+
<tr>
345+
<td>
346+
347+
```
348+
```
349+
350+
</td>
351+
352+
<td>
353+
354+
```
355+
upon PROPOSEVALUE (h_p, round_p, v) {
356+
proposal ← v
357+
broadcast ⟨PROPOSAL, h_p, round_p, proposal, -1⟩
358+
}
359+
```
360+
361+
</td>
362+
</tr>
363+
</table>
364+
365+
366+
#### Notes
367+
Most of the state transitions represented in the previous tables consider message and
368+
events referring to the node's current round `r`.
369+
In the pseudocode this current round of a node is referred as `round_p`.
370+
371+
There are however exceptions that have to be handled properly:
372+
- the transition `L28` requires the node to have access to `PREVOTE` messages from a previous round `r' < r`.
373+
- the transition `L49` requires the node to have access to `PRECOMMIT` messages from different round `r' != r`.
374+
- the transition `L55` requires the node to have access to all messages from a future round `r' > r`.
375+
376+
## References
377+
378+
* ["The latest gossip on BFT consensus"](https://arxiv.org/pdf/1807.04938.pdf), by _Buchman, Kwon, Milosevic_. 2018.

Specs/English/assets/sm_diagram.jpeg

143 KB
Loading

0 commit comments

Comments
 (0)