-
Notifications
You must be signed in to change notification settings - Fork 4.6k
choose highest incremental snapshot first #25584
choose highest incremental snapshot first #25584
Conversation
This is small issue which I found while testing PR about merge two mods - with/without incremental snapshots. |
f1433ac
to
ce63c38
Compare
@brooksprumo ping 🙄 |
Thanks, will review. |
Yeah, I think that's a symptom of nodes not using the same full snapshot interval as their known validators. I understand that if a node doesn't have any configured known validators, or uses multiple known validators with different full snapshot intervals that the node may download different/older snapshots. However I don't think this is an error; the code is working as intended, and it is a misconfiguration of the node. At the same time I acknowledge this is not a great experience; it also may be hard to know the configured full snapshot interval of your chosen known validators. This is exacerbated currently since some nodes have incremental snapshots enabled and others disabled. I'm personally not in favor of this change at this time. I'd rather see (1) the cluster get to a uniform place where all nodes have incremental snapshots enabled, (2) the cluster understands the configuration implications for the full snapshot interval, and (3) have the If (1) happens and we still see a lot of issues due to (2) being too challenging, then I think at that time we can revisit this idea. How does that sound?
This is actually how I originally added incremental snapshot support to bootstrap. Once it was decided that a node should use the same full snapshot interval as its known validators, the implementation was simplified to its current form. |
I'm sorry, but I disagree. I think it's impossible to set same configs in decentalized network. So if someone (maybe atacker) will use short intervals than others, current code will choose him as target rpc to download snapshot from. I think the only one aim of current code is get the latest snapshot. Now the problem exists. And many validators use 3rd party tool to get latest snapshot from cluster. It's sad. I have read your linked prs and issues. Current PR remove the problem with wrong choice but add nothing new ones as I see. And I have found current issue while tested #25732. When I applyed it, then I saw 700 nodes with realy latest snapshot instead of 4 nodes before (and not very fresh). I agree with you, that there are many things that can be improved here. And as validator I have some minds about. But current PR looks like fight with problem, not improving. And my voice for approve it to solve the problem. After we will contining imporve bootstrap by next steps. And I do not exclude that, perhaps, in the next steps we will change this and make it even better. |
I summon @mvines to take small part in this discussion, as I saw he took part in previous discussions around this part of code. |
Ok, I think I'm on board more now, thanks for the elaboration. I'm a bit worried about the current implementation that doesn't require all the peer snapshot hashes be the same (just either the incremental or full). I'll leave some code review comments on how I think we can handle this more simply. |
Co-authored-by: Brooks Prumo <brooks@prumo.org>
(cherry picked from commit 395dbc0) # Conflicts: # validator/src/bootstrap.rs
(cherry picked from commit 395dbc0)
Problem
I see following on the testnet.
Some nodes save full snapshots more often then every default 25K blocks (and/or don't produce incremental snapshots at all).
This leads to an error in choosing the highest snapshot.
Summary of Changes
Choose highest incremental snapshot first.
If all incremental are none, then choose highest full.
Else it does not matter which one will be full.