Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add checkpoint sync #3849

Merged
merged 13 commits into from
Jun 3, 2022
Merged

Add checkpoint sync #3849

merged 13 commits into from
Jun 3, 2022

Conversation

matkt
Copy link
Contributor

@matkt matkt commented May 17, 2022

PR description

Added a new way to synchronize which is X_CHECKPOINT. This mode is experimental so use it at your own risk. This mode allows you to do like a snapsync but starting from a specific checkpoint instead of starting from the genesis.

This checkpoint will be in the genesis configuration of each network. To add the checkpoint mechanism in a network you just have to add the checkpoint section in the genesis.

Currently there is a checkpoint for ropten, goerli and mainnet.
⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️
It is important that several people check the checkpoint that I put in order to validate that it is in the canonical chain
⚠️ ⚠️⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️ ⚠️

https://github.com/hyperledger/besu/pull/3849/files#diff-0d117bd43d649651e6a1df66826434eeed0a0a99ce8c0004a9ba1a47780bd9d0

https://github.com/hyperledger/besu/pull/3849/files#diff-ef83a0feb00d54f0dc7e98b32f5dd3d8e8b86c4ab4b0cc429447501c3cbfc1ea

https://github.com/hyperledger/besu/pull/3849/files#diff-50e6198a9728b0d5ecdf18abf3f6307496ef670f49d19185510e691f7de00e1f

If a user chooses this mode and there is no checkpiont in the genesis BESU will start from block 0

Currently the blocks before the checkpoint are not downloaded but to help the network this will be done in a future PR.
We will download the blocks in background

  • Checkpoint vs Snapsync with Bonsai on Goerli (i3.2xlarge) 02/06/202
Test Snapsync Checkpoint sync
Worldstate (minutes) 43 60
BlockImport - 6720000 (minutes) 191 60 (less but waiting for worldstate)
Total sync (so the longest step) > 3 hours 60 = (1 hours)
  • Checkpoint vs Snapsync with Bonsai on Mainnet (i3.2xlarge) 02/06/2022
Test Snapsync Checkpoint sync
Worldstate (minutes) 370 326 (5 hours : 26 minutes)
BlockImport - 14700000(minutes) 2240 326(less but waiting for worldstate)
Total sync (so the longest step) > 37 hours 326 = (5 hours : 26 minutes)

Fixed Issue(s)

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if
    updates are required.

Changelog

@matkt matkt force-pushed the feature-sync-refactor branch 3 times, most recently from 2bb0643 to 20848d3 Compare May 18, 2022 13:28
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature-sync-refactor branch from 2018b2d to 2972f00 Compare May 24, 2022 13:20
matkt added 2 commits May 24, 2022 15:38
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt removed the snapsync label May 24, 2022
@matkt matkt self-assigned this May 24, 2022
matkt added 6 commits May 26, 2022 18:20
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@garyschulte
Copy link
Contributor

garyschulte commented Jun 2, 2022

I confirmed your checkpoints with both etherscan and canary nodes. ✅

6720000
  goerli checkpoint 0x2ae30061bdfc7f6dad5b07361dce436502eb0fde68645de12bae4929be619188
             total difficulty: 0x967F81
  etherscan 0x2ae30061bdfc7f6dad5b07361dce436502eb0fde68645de12bae4929be619188
  consensys goerli validator 0x2ae30061bdfc7f6dad5b07361dce436502eb0fde68645de12bae4929be619188 
                                validator total difficulty: 0x967f81
12200000
  ropsten checkpoint 0x43de216f876d897e59b9757dd24186e5b53be28bc425ca6a966335b48daaa50c
                total difficulty: 0x928D05243C1CF4
  etherscan 0x43de216f876d897e59b9757dd24186e5b53be28bc425ca6a966335b48daaa50c
  ropsten canary node 0x43de216f876d897e59b9757dd24186e5b53be28bc425ca6a966335b48daaa50c
                canary node total difficulty: 0x928d05243c1cf4
14700000 
  mainnet checkpoint 0x844d581cb00058d19f0584fb582fa2de208876ee56bbae27446a679baf4633f4
                 total difficulty 0xA2539264C62BF98CFC6
  etherscan 0x844d581cb00058d19f0584fb582fa2de208876ee56bbae27446a679baf4633f4
  bonsai full archive node 0x844d581cb00058d19f0584fb582fa2de208876ee56bbae27446a679baf4633f4               
              full archive node total difficulty 0xa2539264c62bf98cfc6

"hash": "0x2ae30061bdfc7f6dad5b07361dce436502eb0fde68645de12bae4929be619188",
"number": 6720000,
"totalDifficulty": "0x967F81",
"_comment": "must be the beginning of an epoch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if this is not a new epoch? does the difficulty adjustment algorithm break? clique vote flushing happens at the wrong time, etc?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's because this part of code need to run into an epoch block

while (true) { // Will run into an epoch block (and thus a VoteTally) to break loop.
. if we are not starting on a new epoch we will have the NoSuchElementException

to avoid changing too much code I prefer to advise starting from a new epoch for clique and like that it will work without problems


@Override
public int getCheckPointWindowSize(final BlockHeader blockHeader) {
return CliqueExtraData.decode(blockHeader).getValidators().size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems maximally safe but why is it necessary if we are specifying block number, hash and difficulty? I am guessing it is PoA related just not sure why.

Copy link
Contributor Author

@matkt matkt Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clearly a POA specificity. If you look at this code. you will see that to validate a signer. We need to look at n block before to check that this signer has not already signed a block (where n is the number of validators at the moment of this block). This is not reset during a new epoch. So the easiest I found is to download the N blocks before in order to have the necessary information for the checkpoint and the following blocks

if (!CliqueHelpers.addressIsAllowedToProduceNextBlock(blockSigner, protocolContext, parent)) {


@Override
boolean validateBlockHeader(final EthPeer ethPeer, final BlockHeader header) {
final boolean valid = super.validateBlockHeader(ethPeer, header);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could be an interesting filter even if we are not doing checkpoint sync

Copy link
Contributor Author

@matkt matkt Jun 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you look here
https://github.com/hyperledger/besu/pull/3849/files/54115a6b35964f2d79c187cce0be938b8eb1f8af#diff-28413f2a1577cf3219e31aeb2ded43051e9f72dba0ff9a0093185f4d395277f7R23
you see that we already have this filter. I just added some specificity to this one for the checkpoint

@@ -36,7 +38,7 @@
import org.slf4j.LoggerFactory;

public class DownloadHeadersStep
implements Function<CheckpointRange, CompletableFuture<CheckpointRangeHeaders>> {
implements Function<SyncTargetRange, CompletableFuture<RangeHeaders>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 next we will have to excise fast from fastSyncDownloader. ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clearly Jiri's work will be good for this part of the code. it's starting to be complex to adapt this code to all of this synchronization options

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid. minor feedback

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt changed the title sync checkpoint mecanism poc Add checkpoint sync Jun 3, 2022
@matkt matkt marked this pull request as ready for review June 3, 2022 08:40
Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt force-pushed the feature-sync-refactor branch from 44ee8a5 to b156e45 Compare June 3, 2022 09:21
@matkt matkt requested a review from garyschulte June 3, 2022 12:50
@matkt matkt added documentation Improvements or additions to documentation doc-change-required Indicates an issue or PR that requires doc to be updated labels Jun 3, 2022
Copy link
Contributor

@NicolasMassart NicolasMassart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified the 3 checkpoints against Etherscan

@@ -20,6 +20,12 @@
"enode://d4f764a48ec2a8ecf883735776fdefe0a3949eb0ca476bd7bc8d0954a9defe8fea15ae5da7d40b5d2d59ce9524a99daedadf6da6283fca492cc80b53689fb3b3@46.4.99.122:32109",
"enode://d2b720352e8216c9efc470091aa91ddafc53e222b32780f505c817ceef69e01d5b0b0797b69db254c586f493872352f5a022b4d8479a00fc92ec55f9ad46a27e@88.99.70.182:30303"
]
},
"checkpoint": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this checkpoint, matches Etherescan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This goerli block hash 0x2ae30061bdfc7f6dad5b07361dce436502eb0fde68645de12bae4929be619188 matches Etherscan (on Goerli)

@@ -32,6 +32,11 @@
"enode://1118980bf48b0a3640bdba04e0fe78b1add18e1cd99bf22d53daac1fd9972ad650df52176e7c7d89d1114cfef2bc23a2959aa54998a46afcf7d91809f0855082@52.74.57.123:30303",
"enode://979b7fa28feeb35a4741660a16076f1943202cb72b6af70d327f053e248bab9ba81760f39d0701ef1d8f89cc1fbd2cacba0710a12cd5314d5e0c9021aa3637f9@5.1.83.226:30303"
]
},
"checkpoint": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this checkpoint, matches Etherescan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mainnet block hash 0x844d581cb00058d19f0584fb582fa2de208876ee56bbae27446a679baf4633f4 matches Etherscan (on mainnet)

@@ -21,6 +21,11 @@
"enode://30b7ab30a01c124a6cceca36863ece12c4f5fa68e3ba9b0b51407ccc002eeed3b3102d20a88f1c1d3c3154e2449317b8ef95090e77b312d5cc39354f86d5d606@52.176.7.10:30303",
"enode://865a63255b3bb68023b6bffd5095118fcc13e79dcf014fe4e47e065c350c7cc72af2e53eff895f11ba1bbb6a2b33271c1116ee870f266618eadfc2e78aa7349c@52.176.100.77:30303"
]
},
"checkpoint": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this checkpoint, matches Etherescan

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Ropsten block hash 0x43de216f876d897e59b9757dd24186e5b53be28bc425ca6a966335b48daaa50c matches Etherscan (on Ropsten network)

Copy link
Contributor

@garyschulte garyschulte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚢

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
@matkt matkt enabled auto-merge (squash) June 3, 2022 19:01
@matkt matkt merged commit 6f85e1f into hyperledger:main Jun 3, 2022
@alexandratran alexandratran removed documentation Improvements or additions to documentation doc-change-required Indicates an issue or PR that requires doc to be updated labels Jun 6, 2022
eum602 pushed a commit to lacchain/besu that referenced this pull request Nov 3, 2023
Add a new way to synchronize which is X_CHECKPOINT. This mode is experimental so use it at your own risk. This mode allows you to do like a snapsync but starting from a specific checkpoint instead of starting from the genesis.

This checkpoint will be in the genesis configuration of each network. To add the checkpoint mechanism in a network you just have to add the checkpoint section in the genesis.

Currently there is a checkpoint for ropten, goerli and mainnet.

Mainnet on i3.2xlarge <6 hours
Goerli on i3.2xlarge <1 hours

Signed-off-by: Karim TAAM <karim.t2am@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants