Join from an existing snapshot #1532

jumaffre · 2020-08-25T16:56:50Z

The PR that puts all the previous snapshot PRs together.

It is now possible for a node to join a network from an existing snapshot, making the catch-up phase much quicker (basically, only replaying the entries from the last snapshot to the last index).

To do so, the operator(s) of the CCF network are responsible for copying the snapshot file(s) produced by one node (under the snapshots/ folder) to the directory of the new joiner. The new joiner will automatically try to join from the latest (based on the snapshot.idx file name) snapshot, deserialising the snapshot in its store when joining (*).

Raft will then be forced to become follower at a specific commit index. The snapshotter and the ledger will also have to be initialised at that index so that the ledger and snapshots on the joiner node are the same than those on other nodes in the network.

I've also added a new snapshot test suite to test this more thoroughly + all the necessary changes to the Python infra to get the snapshots from one node copied over to the new node.

(*) The snapshot can only be deserialised after the ledger secrets have been passed in by the network so that the snapshot can be decrypted.

More thorough assertions on the content of ledger/snapshot files during end-to-end testing
Joiner only becomes part of network once it has seen the signed snapshot evidence

…rate_snapshot

…neration

…nto async_snapshot_generation

…n_from_snapshot

jumaffre · 2020-08-26T11:03:39Z

CMakeLists.txt

@@ -571,13 +588,14 @@ if(BUILD_TESTS)
      NAME reconfiguration_test
      PYTHON_SCRIPT ${CMAKE_SOURCE_DIR}/tests/reconfiguration.py
      CONSENSUS raft
+      ADDITIONAL_ARGS --raft-election-timeout 4000


Lowering the raft election timeout here (default is 100s) as the reconfiguration test now also waits for a new election to complete.

…n_from_snapshot

jumaffre · 2020-08-26T11:05:14Z

src/node/node_state.h

+              auto seqno = network.tables->current_version();
+              consensus->force_become_backup(seqno, sig->view);
+
+              reset_data(config.joining.snapshot);


Once the snapshot has been applied, we don't want to keep it around in enclave memory.

ghost · 2020-08-26T11:11:29Z

join_from_snapshot@11998 aka 20200827.8 vs master ewma over 50 builds from 11569 to 11985

src/consensus/raft/raft.h

src/host/main.cpp

src/host/snapshot.h

achamayou · 2020-08-27T07:33:43Z

src/node/node_state.h

+
+              reset_data(config.joining.snapshot);
+              LOG_INFO_FMT(
+                "Joiner successfully resumed from snapshot at seqno {} and "


This sounds a slightly more final than it really is: we've started resuming from this snapshot, but until we see the evidence from the primary, we don't know if the snapshot is valid.

You're right. That's the next step for this feature so I will address that in the next PR.

src/node/snapshotter.h

tests/infra/network.py

achamayou

LGTM with some minor comments.

Co-authored-by: Amaury Chamayou <amaury@xargs.fr>

Julien Maffre and others added 30 commits July 16, 2020 18:09

Champ correct size

7d8a619

Snapshot from raft

6dd9e0b

Generate snapshots and store to disk

2b283c2

Snapshot protocol WIP

a3705e2

Merge remote-tracking branch 'upstream/master' into generate_snapshot

af0a2e2

ledger max chunk -> ledger min chunk

0b020c9

Snapshots are written to disk

2158750

Snapshotter returns snapshot version to Raft

12e0158

Fix unit tests

09428ee

Merge remote-tracking branch 'upstream/master' into generate_snapshot

0b965de

Format

bcc8b31

black

9a8aa43

SnaPshotter

f4683e0

Unsigned idx

432675c

Merge branch 'master' into generate_snapshot

e2d519c

snapshot_min_tx -> snapshot_max_tx

33b4cb1

Merge branch 'generate_snapshot' of github.com:jumaffre/CCF into gene…

40d0712

…rate_snapshot

Remove type of Tmsg when adding task

8920fe2

And the other half...

07e3b76

Merge branch 'generate_snapshot' into async_snapshot_generation

a6acae5

Snapshot generation is async

7e24cc2

Commit snapshot evidence

143f3a8

Merge remote-tracking branch 'upstream/master' into async_snapshot_ge…

ec87487

…neration

Add snapshot idx to evidence table

a07e6b0

Split snapshot generation and serialisation

e33694c

Format

426b580

Merge branch 'master' into async_snapshot_generation

e83b546

Actually remove last_snapshot_idx from Raft

e2332e3

snapsot evidence singular

ea7dcef

Merge branch 'async_snapshot_generation' of github.com:jumaffre/CCF i…

bc9b328

…nto async_snapshot_generation

Julien Maffre and others added 7 commits August 26, 2020 09:32

Format

c8df8f3

Merge branch 'master' into join_from_snapshot

56ce322

VERBOSE LOGGING (to revert)

5905657

Merge branch 'join_from_snapshot' of github.com:jumaffre/CCF into joi…

a9a2eeb

…n_from_snapshot

Fix dangling reference issue

87594cc

Fix raft unit test (view)

0d55f1e

Merge branch 'master' into join_from_snapshot

9e53fce

jumaffre commented Aug 26, 2020

View reviewed changes

Julien Maffre added 2 commits August 26, 2020 12:04

Quiet

100e21b

Merge branch 'join_from_snapshot' of github.com:jumaffre/CCF into joi…

b709168

…n_from_snapshot

jumaffre commented Aug 26, 2020

View reviewed changes

Merge branch 'master' into join_from_snapshot

4be1eed