|
| 1 | +# `sync` package |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This package implements a client and server that allows for the syncing of a [MerkleDB](../merkledb/README.md). |
| 6 | +The servers have an up to date version of the database, and the clients have an out of date version of the database or an empty database. |
| 7 | + |
| 8 | +It's planned that these client and server implementations will eventually be compatible with Firewood. |
| 9 | + |
| 10 | +## Messages |
| 11 | + |
| 12 | +There are four message types sent between the client and server: |
| 13 | + |
| 14 | +1. `SyncGetRangeProofRequest` |
| 15 | +2. `RangeProof` |
| 16 | +3. `SyncGetChangeProofRequest` |
| 17 | +4. `SyncGetChangeProofResponse` |
| 18 | + |
| 19 | +These message types are defined in `avalanchego/proto/sync.proto`. |
| 20 | +For more information on range proofs and change proofs, see their definitions in `avalanchego/merkledb/proof.go`. |
| 21 | + |
| 22 | +### `SyncGetRangeProofRequest` |
| 23 | + |
| 24 | +This message is sent from the client to the server to request a range proof for a given key range and root hash. |
| 25 | +That is, the client says, "Give me the key-value pairs that were in this key range when the database had this root." |
| 26 | +This request includes a limit on the number of key-value pairs to return, and the size of the response. |
| 27 | + |
| 28 | +### `RangeProof` |
| 29 | + |
| 30 | +This message is sent from the server to the client in response to a `SyncGetRangeProofRequest`. |
| 31 | +It contains the key-value pairs that were in the requested key range when the database had the requested root, |
| 32 | +as well as a proof that the key-value pairs are correct. |
| 33 | +If a server can't serve the entire requested key range in one response, its response will omit keys from the |
| 34 | +end of the range rather than the start. |
| 35 | +For example, if a client requests a range proof for range [`requested_start`, `requested_end`] but the server |
| 36 | +can't fit all the key-value pairs in one response, it'll send a range proof for [`requested_start`, `proof_end`] where `proof_end` < `requested_end`, |
| 37 | +as opposed to sending a range proof for [`proof_start`, `requested_end`] where `proof_start` > `requested_start`. |
| 38 | + |
| 39 | +### `SyncGetChangeProofRequest` |
| 40 | + |
| 41 | +This message is sent from the client to the server to request a change proof between the given root hashes. |
| 42 | +That is, the client says, "Give me the key-value pairs that changed between the time the database had this root and that root." |
| 43 | +This request includes a limit on the number of key-value pairs to return, and the size of the response. |
| 44 | + |
| 45 | +### `SyncGetChangeProofResponse` |
| 46 | + |
| 47 | +This message is sent from the server to the client in response to a `SyncGetChangeProofRequest`. |
| 48 | +If the server had sufficient history to generate a change proof, it contains a change proof that contains |
| 49 | +the key-value pairs that changed between the requested roots. |
| 50 | +If the server did not have sufficient history to generate a change proof, it contains a range proof that |
| 51 | +contains the key-value pairs that were in the database when the database had the latter root. |
| 52 | +Like range proofs, if a client requests a change proof for range [`requested_start`, `requested_end`] but |
| 53 | +the server can't fit all the key-value pairs in one response, |
| 54 | +it'll send a change proof for [`requested_start`, `proof_end`] where `proof_end` < `requested_end`, |
| 55 | +as opposed to sending a change proof for [`proof_start`, `requested_end`] where `proof_start` > `requested_start`. |
| 56 | + |
| 57 | +## Algorithm |
| 58 | + |
| 59 | +For each proof it receives, the sync client tracks the root hash of the revision associated with the proof's key-value pairs. |
| 60 | +For example, it will store information that says something like, "I have all of the key-value pairs that |
| 61 | +are in range [`start`, `end`] for the revision with root `root_hash`" for some keys `start` and `end`. |
| 62 | +Note that `root_hash` is the root hash of the revision that the client is trying to sync to, not the |
| 63 | +root hash of its own (incomplete) database. |
| 64 | +Tracking the revision associated with each downloaded key range, as well as using data in its own |
| 65 | +(incomplete) database, allows the client to figure out which key ranges are not up to date and need to be synced. |
| 66 | +The hash of the incomplete database on a client is never sent anywhere because it does not represent a root hash of any revision. |
| 67 | + |
| 68 | +When the client is created, it is given the root hash of the revision to sync to. |
| 69 | +When it starts syncing, it requests from a server a range proof for the entire database. |
| 70 | +(To indicate that it wants no lower bound on the key range, the client doesn't provide a lower bound in the request. |
| 71 | +To indicate that it wants no upper bound, the client doesn't provide an upper bound. |
| 72 | +Thus, to request the entire database, the client omits both the lower and upper bounds in its request.) |
| 73 | +The server replies with a range proof, which the client verifies. |
| 74 | +If it's valid, the key-value pairs in the proof are written to the database. |
| 75 | +If it's not, the client drops the proof and requests the proof from another server. |
| 76 | + |
| 77 | +A range proof sent by a server must return a continuous range of the key-value pairs, but may not |
| 78 | +return the full range that was requested. |
| 79 | +For example, a client might request all the key-value pairs in [`requested_start`, `requested_end`] |
| 80 | +but only receive those in range [`requested_start`, `proof_end`] where `proof_end` < `requested_end`. |
| 81 | +There might be too many key-value pairs to include in one message, or the server may be too busy to provide any more in its response. |
| 82 | +Unless the database is very small, this means that the range proof the client receives in response to |
| 83 | + its range proof request for the entire database will not contain all of the key-value pairs in the database. |
| 84 | + |
| 85 | +If a client requests a range proof for range [`requested_start`, `requested_end`] but only receives |
| 86 | +a range proof for [`requested_start`, `proof_end`] where `proof_end` < `requested_end` |
| 87 | +it recognizes that it must still fetch all of the keys in [`proof_end`, `requested_end`]. |
| 88 | +It repeatedly requests range proofs for chunks of the remaining key range until it has all of the |
| 89 | +key-value pairs in [`requested_start`, `requested_end`]. |
| 90 | +The client may split the remaining key range into chunks and fetch chunks of key-value pairs in parallel, possibly even from different servers. |
| 91 | + |
| 92 | +Additional commits to the database may occur while the client is syncing. |
| 93 | +The sync client can be notified that the root hash of the database it's trying to sync to has changed. |
| 94 | +Detecting that the root hash to sync to has changed is done outside this package. |
| 95 | +For example, if the database is being used to store blockchain state then the sync client would be |
| 96 | +notified when a new block is accepted because that implies a commit to the database. |
| 97 | +If this occurs, the key-value pairs the client has learned about via range proofs may no longer be up to date. |
| 98 | + |
| 99 | +We use change proofs as an optimization to correct the out of date key-value pairs. |
| 100 | +When the sync client is notified that the root hash to sync to has changed, it requests a change proof |
| 101 | +from a server for a given key range. |
| 102 | +For example, if a client has the key-value pairs in range [`start`, `end`] that were in the database |
| 103 | +when it had `root_hash`, then it will request a change proof that provides all of the key-value changes |
| 104 | +in range [`start`, `end`] from the database version with root hash `root_hash` to the database version with root hash `new_root_hash`. |
| 105 | +The client verifies the change proof, and if it's valid, it applies the changes to its database. |
| 106 | +If it's not, the client drops the proof and requests the proof from another server. |
| 107 | + |
| 108 | +A server needs to have history in order to serve a change proof. |
| 109 | +Namely, it needs to know all of the database changes between two roots. |
| 110 | +If the server does not have sufficient history to generate a change proof, it will send a range proof for |
| 111 | +the requested range at revision `new_root_hash` instead. |
| 112 | +The client will verify and apply the range proof. (Note that change proofs are just an optimization for bandwidth and speed. |
| 113 | +A range proof for a given key range and revision has the same information as a change proof from |
| 114 | +`old_root_hash` to `new_root_hash` for the key range, assuming the client has the key-value pairs |
| 115 | +for the key range at the revision with `old_root_hash`.) |
| 116 | +Change proofs, like range proofs, may not contain all of the key-value pairs in the requested range. |
| 117 | +This is OK because as mentioned above, the client tracks the root hash associated with each range of |
| 118 | +key-value pairs it has, so it knows which key-value pairs are out of date. |
| 119 | +Similar to range proofs, if a client requests the changes in range [`requested_start`, `requested_end`], |
| 120 | +but the server replies with all of the changes in [`requested_start`, `proof_end`] for some `proof_end` < `requested_end`, |
| 121 | +the client will repeatedly request change proofs until it gets remaining key-value pairs (namely in [`proof_end`, `requested_end`]). |
| 122 | + |
| 123 | +Eventually, by repeatedly requesting, receiving, verifying and applying range and change proofs, |
| 124 | +the client will have all of the key-value pairs in the database. |
| 125 | +At this point, it's synced. |
| 126 | + |
| 127 | +## Diagram |
| 128 | + |
| 129 | + |
| 130 | +Assuming you have `Root Hash` `r1` which has many keys, some of which are k25, k50, k75, |
| 131 | +approximately 25%, 50%, and 75% of the way into the sorted set of keys, respectively, |
| 132 | +this diagram shows an example flow from client to server: |
| 133 | + |
| 134 | +```mermaid |
| 135 | +sequenceDiagram |
| 136 | + box Client/Server |
| 137 | + participant Server |
| 138 | + participant Client |
| 139 | + end |
| 140 | + box New Revision Notifier |
| 141 | + participant Notifier |
| 142 | + end |
| 143 | +
|
| 144 | + Note right of Client: Normal sync flow |
| 145 | + Notifier->>Client: CurrentRoot(r1) |
| 146 | + Client->>Server: RangeProofRequest(r1, all) |
| 147 | + Server->>Client: RangeProofResponse(r1, ..k25) |
| 148 | + Client->>Server: RangeProofRequest(r1, k25..) |
| 149 | + Server->>Client: RangeProofResponse(r1, k25..k75) |
| 150 | + Notifier-)Client: NewRootHash(r2) |
| 151 | + Client->>Server: ChangeProofRequest(r1, r2, 0..k75) |
| 152 | + Server->>Client: ChangeProofResponse(r1, r2, 0..k50) |
| 153 | + Client->>Server: ChangeProofRequest(r1, r2, k50..k75) |
| 154 | + Server->>Client: ChangeProofResponse(r1, r2, k50..k75) |
| 155 | + Note right of Client: client is @r2 through (..k75) |
| 156 | + Client->>Server: RangeProofRequest(r2, k75..) |
| 157 | + Server->>Client: RangeProofResponse(r2, k75..k100) |
| 158 | +``` |
| 159 | + |
1 | 160 | ## TODOs
|
2 | 161 |
|
3 | 162 | - [ ] Handle errors on proof requests. Currently, any errors that occur server side are not sent back to the client.
|
|
0 commit comments