|
| 1 | +# Malachite Documentation |
| 2 | + |
| 3 | +Malachite is an implementation of the Tendermint consensus algorithm in Rust. |
| 4 | +It comes together with an executable specification in Quint. We will use |
| 5 | +model-based testing to make sure that the implementation corresponds to the |
| 6 | +specification. |
| 7 | + |
| 8 | +Tendermint consensus works by a set of validator nodes exchanging messages over a |
| 9 | +network, and the local consensus instances act on the incoming messages if |
| 10 | +certain conditions are met (e.g., if a threshold number of specific messages is |
| 11 | +received a state transition should happen). The |
| 12 | +architecture of Malachite separates |
| 13 | + |
| 14 | +- message counting in a *vote book keeper*, |
| 15 | +- creating consensus inputs in a *driver*, e.g., if a threshold is reached |
| 16 | +- doing the state transition depending on the consensus input in the *state machine* |
| 17 | + |
| 18 | +A detailed executable specification of these functionalities are given in Quint. |
| 19 | +In this (English) document we discuss some underlying principles, namely, |
| 20 | + |
| 21 | +- [Message handling](#messages-to-events): How to treat incoming messages. Which messages to store, |
| 22 | +and on what conditions to generate consensus inputs. |
| 23 | + |
| 24 | +- [Round state machine](#consensus-protocol---round-state-machine): How to change state depending on the |
| 25 | +current state and a consensus input. |
| 26 | + |
| 27 | + |
| 28 | + |
| 29 | +## Messages to Events |
| 30 | + |
| 31 | +The consensus state-machine operates on complex `Event`s that reflect the |
| 32 | +reception of one or multiple `Message`s, combined with state elements and the |
| 33 | +interaction with other modules. |
| 34 | + |
| 35 | +This document overviews how messages should be handled at different stages of |
| 36 | +the protocol. |
| 37 | + |
| 38 | +It is assume that a process is at round `r` of height `h` of consensus, or in |
| 39 | +short, at round `(h, r)`. |
| 40 | + |
| 41 | + |
| 42 | +### Different heights |
| 43 | + |
| 44 | +Messages with heights `h'` with either `h' < h` (past) or `h' > h` (future). |
| 45 | + |
| 46 | +The pseudocode description of the algorithm ignores messages from different |
| 47 | +heights. |
| 48 | +If we take the same approach in this specification, we have to specify |
| 49 | +separately modules responsible to handle those messages. |
| 50 | + |
| 51 | + |
| 52 | +- Past heights (`h' < h`): the consensus state machine is not affected by such |
| 53 | + messages. However, their reception might indicate that a peer is lagging |
| 54 | + behind in the protocol, and need to be synchronized. |
| 55 | + - In CometBFT's implementation we handle message from the previous height |
| 56 | + (`h' = h - 1`) for the `LastCommit` vote set. This only happens during the |
| 57 | + first step of the first round (`r = 0`) of a height. |
| 58 | +- Future heights (`h' > h`): the consensus state machine is not able to process |
| 59 | + message from future heights in a proper way, as the validator set for them is |
| 60 | + not known. However, once the process reaches this height `h'`, those messages |
| 61 | + are _required_ for proper operation. There are two options here: |
| 62 | + 1. Buffer a limited amount of such messages |
| 63 | + 2. Assume that the communication subsystem (p2p) is able to retrieve (ask for |
| 64 | + retransmission) of them when the process reaches height `h'`. |
| 65 | + Notice that this option implies that processes keep a minimal set of |
| 66 | + consensus messages that enables peers lagging behind to decide a past height. |
| 67 | + |
| 68 | +### Previous rounds |
| 69 | + |
| 70 | +Messages from rounds `(h, r')` with `r' < r`: same height `h` but previous round `r'`. |
| 71 | + |
| 72 | +The consensus state machine requires receiving and processing messages from |
| 73 | +previous rounds: |
| 74 | + |
| 75 | +- `PREVOTE` messages can produce a Proof of Lock (POL) `2f + 1 ⟨PREVOTE, h, vr, id(v)⟩` |
| 76 | + needed for accepting `PROPOSAL(h, r, v, vr)` message from the current round, |
| 77 | + where `vr == r' < r` (L28). |
| 78 | +- `PRECOMMIT` messages can produce a Precommit quorum `2f + 1 ⟨PRECOMMIT, h, r', id(v)⟩` |
| 79 | + that leads to the decision of `v` at round `r'` (L49). |
| 80 | +- `PROPOSAL` messages can be required to match a produced Precommit quorum (L49). |
| 81 | + - Associated full value messages are required to produce the `⟨PROPOSAL, h, r', v, *⟩` event |
| 82 | + |
| 83 | +The production of the enumerated events from previous rounds should be |
| 84 | +identical to the production of events from messages from the [current round](#current-round). |
| 85 | + |
| 86 | +### Future rounds |
| 87 | + |
| 88 | +Messages from rounds `(h, r')` with `r' > r`: same height `h` but future round `r'`. |
| 89 | + |
| 90 | +#### Round skipping |
| 91 | + |
| 92 | +The consensus state machine requires receiving and processing messages from |
| 93 | +future rounds for enabling the _round skipping_ mechanism, defined as follows |
| 94 | +in the pseudocode: |
| 95 | + |
| 96 | +``` |
| 97 | +55: upon f + 1 ⟨∗, hp, round, ∗, ∗⟩ with round > roundp do |
| 98 | +56: StartRound(round) |
| 99 | +``` |
| 100 | + |
| 101 | +The current interpretation of this rule is that messages from a round `r' > r` |
| 102 | +are received from `f + 1` voting-power equivalent distinct senders. |
| 103 | +This means, that at least `1` correct process is at round `r'`. |
| 104 | + |
| 105 | +While this threshold does not need to be adopted (it can be configurable), |
| 106 | +messages from a future round should initially have their unique senders counted. |
| 107 | +Once the round skip threshold of processes is reached, the corresponding event |
| 108 | +should be produced. |
| 109 | + |
| 110 | +#### Limits |
| 111 | + |
| 112 | +The same reasoning applied for messages from [future heights](#different-heights) |
| 113 | +applies for messages from future rounds. |
| 114 | + |
| 115 | +Messages from future rounds are _required_ for the proper operation of the |
| 116 | +consensus state machine once the process reaches their round `r'`. |
| 117 | +There are two options, which can in particular be combined: |
| 118 | + |
| 119 | +1. Buffer a limited amount of such messages, or messages from a limited amount |
| 120 | + of future rounds `r'` |
| 121 | + - In CometBFT's implementation, only messages from round `r' = r + 1` are tracked. |
| 122 | +2. Assume that the communication subsystem (p2p) is able to retrieve (ask for |
| 123 | + retransmission) of messages from future rounds when the process reaches round `r'`. |
| 124 | + Since messages from [previous rounds](#previous-rounds) are stored by |
| 125 | + default, peers that have reached the future round `r'` should be able to |
| 126 | + retransmit them. |
| 127 | + |
| 128 | +### Current round |
| 129 | + |
| 130 | +Messages matching the current round `(h, r)` of a process produce most of the |
| 131 | +relevant events for the consensus state machine. |
| 132 | + |
| 133 | +### Counting votes |
| 134 | + |
| 135 | +Messages `⟨PREVOTE, h, r, *⟩` and `⟨PRECOMMIT, h, r, *⟩` are generically called votes. |
| 136 | +They refer to a round step `(h, r, s)` of consensus, where `s` is defined by |
| 137 | +the vote type, either `PREVOTE` or `PRECOMMIT`. |
| 138 | + |
| 139 | +The processing of _individual_ vote messages doesn't produce events relevant for |
| 140 | +the consensus state machine. |
| 141 | +But when the number of unique vote messages referring to a given round step |
| 142 | +`(h, r, s)` reaches a given _threshold_, relevant events are produced; |
| 143 | +the produced event depends on the value carried by such votes. |
| 144 | + |
| 145 | +General assumptions regarding vote messages: |
| 146 | + |
| 147 | +- Vote messages are produced, signed and broadcast by a validator, which is its |
| 148 | + *sender* |
| 149 | + - To define whether a vote message for round step `(h, r, s)` is valid, the |
| 150 | + validator set for height `h` must be known. |
| 151 | + The validator set can change over heights, but it is the same within a height. |
| 152 | +- To each validator in the validator set of a height `h` is associated a *voting power* |
| 153 | + - Thresholds are computed from the voting power associated to the |
| 154 | + sender of each vote message |
| 155 | +- A vote message carries a value: either a reference to a proposed value |
| 156 | + `id(v)`, or the special `nil` value |
| 157 | + - For practical effects, it should be considered that the size of vote |
| 158 | + messages is constant |
| 159 | +- Correct validators produce at most one vote message per round step: either |
| 160 | + for a `id(v)` or for `nil` |
| 161 | +- Byzantine validators may equivocate and produce multiple distinct vote |
| 162 | + messages for the same round step. Equivocating vote messages differ on the |
| 163 | + value they carry: for `nil`, `id(v)`, `id(v')`, etc. |
| 164 | + - This possibility constitutes an attack vector. A process must thus restrict |
| 165 | + the number of distinct vote messages from the same sender and referring to |
| 166 | + the same round step that can be stored. |
| 167 | + |
| 168 | +#### `f + 1` threshold |
| 169 | + |
| 170 | +This threshold represents that vote messages referring to a round step were |
| 171 | +received from a enough number of unique senders, so that it is guaranteed that |
| 172 | +_at least one_ of the senders is a _correct_ validator. |
| 173 | + |
| 174 | +The rationale here is that the cumulative voting power of Byzantine validators |
| 175 | +cannot exceed `f`, so that at least one of the considered vote messages must |
| 176 | +have been produced by a correct validator. |
| 177 | + |
| 178 | +#### `2f + 1` threshold |
| 179 | + |
| 180 | +This threshold represents that vote messages referring to a round step were |
| 181 | +received from a enough number of unique senders, so that it is guaranteed that |
| 182 | +_the majority_ of the senders are _correct_ validators. |
| 183 | + |
| 184 | +The rationale here is that the cumulative voting power of Byzantine validators |
| 185 | +cannot exceed `f`, so that the subset of considered vote messages that must |
| 186 | +have been produced by correct validators have a cumulative voting power of at |
| 187 | +least `f + 1`, which is strictly greater than `f`. |
| 188 | + |
| 189 | +## Consensus protocol - round state machine |
| 190 | + |
| 191 | +This document provides an overview of the Tendermint consensus protocol and follows ["The latest gossip on BFT consensus"](#References) and the English and Quint specifications located in the [Specs](../../Specs) directory. |
| 192 | + |
| 193 | +The consensus state-machine operates on complex `Event`s that reflect the |
| 194 | +reception of one or multiple `Message`s, combined with state elements and the |
| 195 | +interaction with other modules. |
| 196 | + |
| 197 | +### Round state-machine |
| 198 | + |
| 199 | +The state machine represents the operation of consensus at a single `Height(h)` and `Round(r)`. |
| 200 | +The diagram below offers a visual representation of the state machine. It shows the input events, using green for simple inputs (e.g. timeouts, porposal) |
| 201 | +and red for the complex events (e.g. `ProposalAndPolkaCurrent` is sent to the state machine when a valid proposal and a polka of prevotes have been received). |
| 202 | +The actions are shown in italics (blue) and the output messages are shown in blue. |
| 203 | + |
| 204 | + |
| 205 | + |
| 206 | +The set of states can be summarized as: |
| 207 | + |
| 208 | +- `NewRound` |
| 209 | + - Initial state |
| 210 | + - Can be used to store messages early received for this round |
| 211 | + - In the algorithm when `roundp < r`, where `roundp` is the node's current round |
| 212 | +- InProgress (`Propose`, `Prevote`, `Precommit`) |
| 213 | + - Actual consensus single-round execution |
| 214 | + - In the algorithm when `roundp == r` |
| 215 | +- `Commit` |
| 216 | + - Final state for a successful round |
| 217 | + |
| 218 | +#### Exit transitions |
| 219 | +The table below summarizes the major state transitions in the `Round(r)` state machine. |
| 220 | +The transactions from state `InProgress` consider that node can be at any of |
| 221 | +the `Propose`, `Prevote`, `Precommit` states. |
| 222 | +The `Ref` column refers to the line of the pseudocode where the events can be found. |
| 223 | + |
| 224 | +| From | To | Ev Name | Event Details | Action | Ref | |
| 225 | +| ---------- |------------|------------------------------|-------------------------------------------------------------------|-----------------------------------| --- | |
| 226 | +| InProgress | InProgress | PrecommitAny | `2f + 1 ⟨PRECOMMIT, h, r, *⟩` <br> for the first time | schedule `TimeoutPrecommit(h, r)` | L47 | |
| 227 | +| InProgress | NewRound | TimeoutPrecommit | `TimeoutPrecommit(h, r)` | `next_round(r+1)` | L65 | |
| 228 | +| InProgress | NewRound | SkipRound(r') | `f + 1 ⟨*, h, r', *, *⟩` with `r' > r` | `next_round(r')` | L55 | |
| 229 | +| InProgress | Commit | ProposalAndPrecommitValue(v) | `⟨PROPOSAL, h, r', v, *⟩` <br> `2f + 1 ⟨PRECOMMIT, h, r', id(v)⟩` | `commit(v)` | L49 | |
| 230 | + |
| 231 | +#### InProgress round |
| 232 | + |
| 233 | +The table below summarizes the state transitions within the `InProgress` state |
| 234 | +of the `Round(r)` state machine. |
| 235 | +The following state transitions represent the core of the consensus algorithm. |
| 236 | +The `Ref` column refers to the line of the pseudocode where the events can be found. |
| 237 | + |
| 238 | +| From | To | Event | Details | Actions and Return | Ref | |
| 239 | +|-----------|-----------|----------------------------------------|----------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|-----| |
| 240 | +| NewRound | Propose | NewRound(proposer) | `StartRound` with `proposer(h, r) = p` | **async `getValue()` and schedule `TimeoutPropose(h, r)`** | L19 | |
| 241 | +| NewRound | Propose | NewRound(non-proposer) | `StartRound` with `proposer(h, r) != p` (optional restriction) | schedule `TimeoutPropose(h, r)` | L21 | |
| 242 | +| **Propose** | **Propose** | **ProposeValue(v)** | `getValue()` returned | broadcast `⟨PROPOSAL, h, r, v, validRound⟩` | L19 | |
| 243 | +| Propose | Prevote | Proposal(v, -1) | `⟨PROPOSAL, h, r, v, −1⟩` | broadcast `⟨PREVOTE, h, r, {id(v), nil}⟩` | L23 | |
| 244 | +| Propose | Prevote | **InvalidProposal**(v, -1) | `⟨PROPOSAL, h, r, v, −1⟩` | broadcast `⟨PREVOTE, h, r, nil⟩` | L32 | |
| 245 | +| Propose | Prevote | ProposalAndPolkaPrevious(v, vr) | `⟨PROPOSAL, h, r, v, vr⟩` <br> `2f + 1 ⟨PREVOTE, h, vr, id(v)⟩` with `vr < r` | broadcast `⟨PREVOTE, h, r, {id(v), nil}⟩` | L30 | |
| 246 | +| Propose | Prevote | **InvalidProposalAndPolkaPrevious**(v, vr) | `⟨PROPOSAL, h, r, v, vr⟩` <br> `2f + 1 ⟨PREVOTE, h, vr, id(v)⟩` with `vr < r` | broadcast `⟨PREVOTE, h, r, nil⟩` | L32 | |
| 247 | +| Propose | Prevote | TimeoutPropose | `TimeoutPropose(h, r)` | broadcast `⟨PREVOTE, h, r, nil⟩` | L57 | |
| 248 | +| Prevote | Prevote | PolkaAny | `2f + 1 ⟨PREVOTE, h, r, *⟩` <br> for the first time | schedule `TimeoutPrevote(h, r)⟩` | L34 | |
| 249 | +| Prevote | Precommit | ProposalAndPolkaCurrent(v) | `⟨PROPOSAL, h, r, v, ∗⟩` <br> `2f + 1 ⟨PREVOTE, h, r, id(v)⟩` <br> for the first time | update `lockedValue, lockedRound, validValue, validRound`,<br /> broadcast `⟨PRECOMMIT, h, r, id(v)⟩` | L36 | |
| 250 | +| Prevote | Precommit | PolkaNil | `2f + 1 ⟨PREVOTE, h, r, nil⟩` | broadcast `⟨PRECOMMIT, h, r, nil⟩` | L44 | |
| 251 | +| Prevote | Precommit | TimeoutPrevote | `TimeoutPrevote(h, r)` | broadcast `⟨PRECOMMIT, h, r, nil⟩` | L61 | |
| 252 | +| Precommit | Precommit | PolkaValue(v) | `⟨PROPOSAL, h, r, v, ∗⟩` <br> `2f + 1 ⟨PREVOTE, h, r, id(v)⟩` <br> for the first time | update `validValue, validRound` | L36 | |
| 253 | + |
| 254 | +The ordinary operation of a round of consensus consists on the sequence of |
| 255 | +round steps `Propose`, `Prevote`, and `Precommit`, represented in the table. |
| 256 | +The conditions for concluding a round of consensus, therefore for leaving the |
| 257 | +`InProgress` state, are presented in the previous subsection. |
| 258 | + |
| 259 | +##### Validity Checks |
| 260 | +The pseudocode of the algorithm includes validity checks for the messages. These checks have been moved out of the state machine and are now performed by the `driver` module. |
| 261 | +For this reason: |
| 262 | +- `L22` is covered by `Proposal(v, -1) and `InvalidProposal(v, -1)` |
| 263 | +- `L28` is covered by `ProposalAndPolkaPrevious(v, vr)` and `InvalidProposalAndPolkaPrevious(v, vr)` |
| 264 | +- `L36` and `L49` are only called with valid proposal |
| 265 | + |
| 266 | +TODO - show the full algorithm with all the changes |
| 267 | + |
| 268 | +##### Asynchronous getValue() and ProposeValue(v) |
| 269 | +The original algorithm is modified to allow for asynchronous `getValue()`. The details are described below. |
| 270 | + |
| 271 | +<table> |
| 272 | +<tr> |
| 273 | +<th>arXiv paper</th> |
| 274 | +<th>Async getValue()</th> |
| 275 | +</tr> |
| 276 | + |
| 277 | +<tr > |
| 278 | +<td> |
| 279 | + |
| 280 | +``` |
| 281 | +function StartRound(round) { |
| 282 | + round_p ← round |
| 283 | + step_p ← propose |
| 284 | + if proposer(h_p, round_p) = p { |
| 285 | + if validValue_p != nil { |
| 286 | + proposal ← validValue_p |
| 287 | +
|
| 288 | +
|
| 289 | +
|
| 290 | + } else { |
| 291 | + proposal ← getValue() |
| 292 | +
|
| 293 | + } |
| 294 | +
|
| 295 | +
|
| 296 | + broadcast ⟨PROPOSAL, h_p, round_p, proposal, validRound_p⟩ |
| 297 | + } else { |
| 298 | + schedule OnTimeoutPropose(h_p,round_p) to |
| 299 | + be executed after timeoutPropose(round_p) |
| 300 | + } |
| 301 | +} |
| 302 | +``` |
| 303 | + |
| 304 | +</td> |
| 305 | + |
| 306 | +<td> |
| 307 | + |
| 308 | +``` |
| 309 | +function StartRound(round) { |
| 310 | + round_p ← round |
| 311 | + step_p ← propose |
| 312 | + if proposer(h_p, round_p) = p { |
| 313 | + if validValue_p != nil { |
| 314 | + proposal ← validValue_p |
| 315 | +
|
| 316 | + broadcast ⟨PROPOSAL, h_p, round_p, proposal, validRound_p⟩ |
| 317 | +
|
| 318 | + } else { |
| 319 | + getValue() // async |
| 320 | + schedule OnTimeoutPropose(h_p,round_p) to |
| 321 | + be executed after timeoutPropose(round_p) |
| 322 | + } |
| 323 | +
|
| 324 | +
|
| 325 | + } else { |
| 326 | + schedule OnTimeoutPropose(h_p,round_p) to |
| 327 | + be executed after timeoutPropose(round_p) |
| 328 | + } |
| 329 | +} |
| 330 | +``` |
| 331 | + |
| 332 | +</td> |
| 333 | +</tr> |
| 334 | +</table> |
| 335 | + |
| 336 | +- New Rule added |
| 337 | + |
| 338 | +<table> |
| 339 | +<tr> |
| 340 | +<th>arXiv paper</th> |
| 341 | +<th>Async getValue()</th> |
| 342 | +</tr> |
| 343 | + |
| 344 | +<tr> |
| 345 | +<td> |
| 346 | + |
| 347 | +``` |
| 348 | +``` |
| 349 | + |
| 350 | +</td> |
| 351 | + |
| 352 | +<td> |
| 353 | + |
| 354 | +``` |
| 355 | +upon PROPOSEVALUE (h_p, round_p, v) { |
| 356 | + proposal ← v |
| 357 | + broadcast ⟨PROPOSAL, h_p, round_p, proposal, -1⟩ |
| 358 | +} |
| 359 | +``` |
| 360 | + |
| 361 | +</td> |
| 362 | +</tr> |
| 363 | +</table> |
| 364 | + |
| 365 | + |
| 366 | +#### Notes |
| 367 | +Most of the state transitions represented in the previous tables consider message and |
| 368 | +events referring to the node's current round `r`. |
| 369 | +In the pseudocode this current round of a node is referred as `round_p`. |
| 370 | + |
| 371 | +There are however exceptions that have to be handled properly: |
| 372 | +- the transition `L28` requires the node to have access to `PREVOTE` messages from a previous round `r' < r`. |
| 373 | +- the transition `L49` requires the node to have access to `PRECOMMIT` messages from different round `r' != r`. |
| 374 | +- the transition `L55` requires the node to have access to all messages from a future round `r' > r`. |
| 375 | + |
| 376 | +## References |
| 377 | + |
| 378 | +* ["The latest gossip on BFT consensus"](https://arxiv.org/pdf/1807.04938.pdf), by _Buchman, Kwon, Milosevic_. 2018. |
0 commit comments