Skip to content
This repository has been archived by the owner on Jan 22, 2025. It is now read-only.

choose highest incremental snapshot first #25584

Merged

Conversation

diman-io
Copy link
Contributor

Problem

I see following on the testnet.
Some nodes save full snapshots more often then every default 25K blocks (and/or don't produce incremental snapshots at all).

This leads to an error in choosing the highest snapshot.

Summary of Changes

Choose highest incremental snapshot first.
If all incremental are none, then choose highest full.
Else it does not matter which one will be full.

@diman-io diman-io requested a review from brooksprumo May 26, 2022 13:48
@mergify mergify bot added the community Community contribution label May 26, 2022
@diman-io
Copy link
Contributor Author

diman-io commented May 26, 2022

This is small issue which I found while testing PR about merge two mods - with/without incremental snapshots.
So I don't want to imporove smth here just now, but I have some thoughs about.
First I want to merge two mods))
But this issue looks enough to be alone small PR.

@diman-io diman-io force-pushed the choose-highest-incremental-first branch from f1433ac to ce63c38 Compare June 2, 2022 10:36
@diman-io
Copy link
Contributor Author

diman-io commented Jun 2, 2022

@brooksprumo ping 🙄

@brooksprumo
Copy link
Contributor

@brooksprumo ping 🙄

Thanks, will review.

@brooksprumo
Copy link
Contributor

Problem

I see following on the testnet. Some nodes save full snapshots more often then every default 25K blocks (and/or don't produce incremental snapshots at all).

This leads to an error in choosing the highest snapshot.

Yeah, I think that's a symptom of nodes not using the same full snapshot interval as their known validators.

I understand that if a node doesn't have any configured known validators, or uses multiple known validators with different full snapshot intervals that the node may download different/older snapshots. However I don't think this is an error; the code is working as intended, and it is a misconfiguration of the node.

At the same time I acknowledge this is not a great experience; it also may be hard to know the configured full snapshot interval of your chosen known validators. This is exacerbated currently since some nodes have incremental snapshots enabled and others disabled.

I'm personally not in favor of this change at this time. I'd rather see (1) the cluster get to a uniform place where all nodes have incremental snapshots enabled, (2) the cluster understands the configuration implications for the full snapshot interval, and (3) have the mod with/without_incremental_snapshots merged (#20848).

If (1) happens and we still see a lot of issues due to (2) being too challenging, then I think at that time we can revisit this idea. How does that sound?

Summary of Changes

Choose highest incremental snapshot first. If all incremental are none, then choose highest full. Else it does not matter which one will be full.

This is actually how I originally added incremental snapshot support to bootstrap. Once it was decided that a node should use the same full snapshot interval as its known validators, the implementation was simplified to its current form.

@diman-io
Copy link
Contributor Author

diman-io commented Jun 2, 2022

I'm sorry, but I disagree.

I think it's impossible to set same configs in decentalized network. So if someone (maybe atacker) will use short intervals than others, current code will choose him as target rpc to download snapshot from.

I think the only one aim of current code is get the latest snapshot. Now the problem exists. And many validators use 3rd party tool to get latest snapshot from cluster. It's sad.

I have read your linked prs and issues.
I have added #25732 but it is WIP. I was going to do this a little later, after full testing. But it seems the time has come to show it at least in this form.

Current PR remove the problem with wrong choice but add nothing new ones as I see. And I have found current issue while tested #25732. When I applyed it, then I saw 700 nodes with realy latest snapshot instead of 4 nodes before (and not very fresh).

I agree with you, that there are many things that can be improved here. And as validator I have some minds about. But current PR looks like fight with problem, not improving. And my voice for approve it to solve the problem. After we will contining imporve bootstrap by next steps. And I do not exclude that, perhaps, in the next steps we will change this and make it even better.

@diman-io
Copy link
Contributor Author

diman-io commented Jun 2, 2022

I summon @mvines to take small part in this discussion, as I saw he took part in previous discussions around this part of code.

@brooksprumo
Copy link
Contributor

Ok, I think I'm on board more now, thanks for the elaboration. I'm a bit worried about the current implementation that doesn't require all the peer snapshot hashes be the same (just either the incremental or full). I'll leave some code review comments on how I think we can handle this more simply.

diman-io and others added 2 commits June 2, 2022 21:52
Co-authored-by: Brooks Prumo <brooks@prumo.org>
@brooksprumo brooksprumo self-requested a review June 2, 2022 20:50
@brooksprumo brooksprumo merged commit 395dbc0 into solana-labs:master Jun 2, 2022
mergify bot pushed a commit that referenced this pull request Jun 2, 2022
(cherry picked from commit 395dbc0)

# Conflicts:
#	validator/src/bootstrap.rs
mergify bot pushed a commit that referenced this pull request Jun 2, 2022
mergify bot added a commit that referenced this pull request Jun 2, 2022
(cherry picked from commit 395dbc0)

Co-authored-by: DimAn <diman@diman.io>
mergify bot added a commit that referenced this pull request Jun 3, 2022
* choose highest incremental snapshot first (#25584)

(cherry picked from commit 395dbc0)

# Conflicts:
#	validator/src/bootstrap.rs

* pr: fix merge conflicts

Co-authored-by: DimAn <diman@diman.io>
Co-authored-by: Brooks Prumo <brooks@solana.com>
@diman-io diman-io deleted the choose-highest-incremental-first branch May 13, 2023 21:44
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
community Community contribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants