Skip to content
This repository was archived by the owner on Jan 22, 2025. It is now read-only.

Conversation

@ryoqun
Copy link
Contributor

@ryoqun ryoqun commented Sep 19, 2020

Problem

Tower::root() can get away of Option<_>.

And there is postponed local test addition suggested from last review at #10718.

Summary of Changes

Let's simplify code and reduce combinatorial complexes for humans.

And some minor follow up fixes.

Also, adds long-awaited interesting test: #10718 (comment)

There should be no functional change.

Context

follow up #10718

@codecov
Copy link

codecov bot commented Sep 19, 2020

Codecov Report

Merging #12350 into master will decrease coverage by 0.0%.
The diff coverage is 78.5%.

@@            Coverage Diff            @@
##           master   #12350     +/-   ##
=========================================
- Coverage    81.9%    81.9%   -0.1%     
=========================================
  Files         362      362             
  Lines       85221    85236     +15     
=========================================
+ Hits        69862    69863      +1     
- Misses      15359    15373     +14     

@stale
Copy link

stale bot commented Sep 26, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale [bot only] Added to stale content; results in auto-close after a week. label Sep 26, 2020
@stale
Copy link

stale bot commented Oct 4, 2020

This stale pull request has been automatically closed. Thank you for your contributions.

@stale stale bot closed this Oct 4, 2020
@ryoqun ryoqun reopened this Oct 4, 2020
@stale stale bot removed the stale [bot only] Added to stale content; results in auto-close after a week. label Oct 4, 2020
@ryoqun ryoqun force-pushed the persistent-tower-followup branch from 3f5ecc9 to a607473 Compare October 9, 2020 08:14
@ryoqun
Copy link
Contributor Author

ryoqun commented Oct 9, 2020

I already want #12739 ;)

error[E0277]: arrays only have std trait implementations for lengths 0..=32
   --> sdk/src/signature.rs:51:22
    |
51  |         bs58::encode(&self.0.to_bytes()).into_string()
    |                      ^^^^^^^^^^^^^^^^^^ the trait `std::array::LengthAtMost32` is not implemented for `[u8; 64]`
    |
   ::: /home/.cargo/registry/src/github.com-1ecc6299db9ec823/bs58-0.3.1/src/lib.rs:207:18
    |
207 | pub fn encode<I: AsRef<[u8]>>(input: I) -> encode::EncodeBuilder<'static, I> {
    |                  ----------- required by this bound in `bs58::encode`
    |
    = note: required because of the requirements on the impl of `std::convert::AsRef<[u8]>` for `[u8; 64]`
    = note: required because of the requirements on the impl of `std::convert::AsRef<[u8]>` for `&[u8; 64]`

// Should never consider switching to an ancestor
// of your last vote
assert!(!last_vote_ancestors.contains(&switch_slot));
// Generally, should never consider switching to an ancestor of your last vote
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go to #12671

// meaning some inconsistency between saved tower and ledger.
// (newer snapshot, or only a saved tower is moved over to new setup?)
// compare slots not to error! just because of newer snapshots
if self.last_switch_threshold_check.is_none() && switch_slot < last_voted_slot {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define should_warn_on_first_switch_check()

Comment on lines 36 to 37
SameFork,
FailedSwitchThreshold(u64, u64),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract this into separate pr.

);
if switch_fork_decision == SwitchForkDecision::FailedSwitchThreshold {
let a = Some((heaviest_bank.slot(), switch_fork_decision.clone()));
if tower.last_switch_threshold_check != a {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to tower

&& propagation_confirmed
&& switch_fork_decision != SwitchForkDecision::FailedSwitchThreshold
{
if let SwitchForkDecision::FailedSwitchThreshold(_, _) = switch_fork_decision {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deduplicate with the following else clause.

@mvines
Copy link
Contributor

mvines commented Oct 12, 2020

@ryoqun - should we bring this into v1.4 as well? I'm thinking why not, since v1.4 will live for a couple months

@ryoqun
Copy link
Contributor Author

ryoqun commented Oct 12, 2020

@ryoqun - should we bring this into v1.4 as well? I'm thinking why not, since v1.4 will live for a couple months

Yeah, definitely, I'll do. :)

@ryoqun ryoqun force-pushed the persistent-tower-followup branch from d65328d to 416c5c3 Compare October 16, 2020 07:00
@ryoqun ryoqun requested a review from carllin October 16, 2020 13:57
@ryoqun ryoqun marked this pull request as ready for review October 16, 2020 13:57
@ryoqun
Copy link
Contributor Author

ryoqun commented Oct 16, 2020

@carllin Could you review this?

// With the persisted tower:
// `A` should not be able to generate a switching proof.
//
fn do_test_optimistic_confirmation_violation_with_or_without_tower(with_tower: bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, this looked really involved/painful 🤕. Thanks for implementing that hacky suggestion for a test, hopefully this makes us feel better about the optimistic confirmation + saved tower interaction!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to be frank, writing this was very hard and needed more logging improvement resulting in #12875. ;) Well, I'm actually surprised that I could write one. xD

);
blockstore.set_dead_slot(prev_voted_slot).unwrap();

std::fs::remove_file(Tower::get_filename(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this no longer necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin Yeah. As I figured out this in hard way, saved tower doesn't protect us from violating opt. conf., if voted slot is marked dead, forcibly. So, as you guessed, i'm effectively making this test harder to pass (=actually violate) by not removing the saved tower and indeed this test still passes. :)

#10718 (comment):

Well, I found it's very hard to make a validator violate optimistic confirmation if and only if tower is removed.
The hard part is that marking a slot as dead makes a validator to violate the optimistic conf. even if tower is persited. And eventually, the cluster makes new roots.
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added explicit comment mentioning for the saved tower: https://github.com/solana-labs/solana/pull/12350/files#r506909261

&opt,
)
.unwrap();
std::fs::remove_file(Tower::get_filename(
Copy link
Contributor

@carllin carllin Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extract this and other similar calls to a delete_tower() function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

))
.unwrap();

let blockstore = Blockstore::open_with_access_type(
Copy link
Contributor

@carllin carllin Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extract this open -> purge logic into a separate function because the same logic is reused below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

loop {
sleep(Duration::from_millis(100));
let highest_bank = client
.get_slot_with_commitment(CommitmentConfig::recent())
Copy link
Contributor

@carllin carllin Oct 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to assert here that validator b actually voted on next_slot_on_a (maybe a variation of last_vote_in_tower() that checks tower_contains_vote()?

I think CommitmentConfig::recent() implies the validator voted, but want to double check in case that recent() guarantee ever changes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this suggestion makes sense. well, I was just lazy... ;)

))
.unwrap();

// For some reason, fork_choice always selects slot 27 over votes_on_c_fork.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun I think I see what's happening.

If you don't delete slot 27 from the ledger, then the validator A will immediately vote for 27 on restart, because it hasn't gotten the heavier fork from validator C yet. Then it will be stuck on 27 unable to switch because C doesn't have enough stake to generate a switching proof

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. that makes sense. The hard part of writing this was that validator actually try very hard to move forward the chain and resolve any divergent. I guess I can't hold validator from voting after restart, right? I'll update this comment of for some reason with your comment. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ryoqun yeah no way to prevent the validator from voting on restart, so I think this is the correct thing!

let node_stakes = vec![31, 36, 33, 0];

// Each pubkeys are prefixed with A, B, C and D.
// D is needed to avoid NoPropagatedConfirmation erorrs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@carllin oh, btw, NoPropagatedConfirmation can be worked around just by not staked votes by non-existing vote account. So, any validator can be tricked into the false illusion of propagation. And attacker can do this without fear of slashing, maybe? This might be actually be a bug? Should we exclude 0-lamport votes and introduce some threshold like 1 (or N) lamport(s) or 0.01% stake?

}

pub fn to_base58_string(&self) -> String {
// Remove .iter() once we're rust 1.47+
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: @t-nelson Another LengthAtMost32 artificial shackle. I'm looking forward to #12739. :)

error[E0277]: arrays only have std trait implementations for lengths 0..=32
   --> sdk/src/signature.rs:51:22
    |
51  |         bs58::encode(&self.0.to_bytes()).into_string()
    |                      ^^^^^^^^^^^^^^^^^^ the trait `std::array::LengthAtMost32` is not implemented for `[u8; 64]`
    |
   ::: /home/.cargo/registry/src/github.com-1ecc6299db9ec823/bs58-0.3.1/src/lib.rs:207:18
    |
207 | pub fn encode<I: AsRef<[u8]>>(input: I) -> encode::EncodeBuilder<'static, I> {
    |                  ----------- required by this bound in `bs58::encode`
    |
    = note: required because of the requirements on the impl of `std::convert::AsRef<[u8]>` for `[u8; 64]`
    = note: required because of the requirements on the impl of `std::convert::AsRef<[u8]>` for `&[u8; 64]`

Comment on lines +1430 to +1432
// marking this voted slot as dead makes the saved tower garbage
// effectively. That's because its stray last vote becomes stale (= no
// ancestor in bank forks).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here

@ryoqun ryoqun requested a review from carllin October 17, 2020 09:33
@ryoqun ryoqun changed the title Follow up to persistent tower Follow up to persistent tower with tests and API cleaning Oct 17, 2020
@ryoqun
Copy link
Contributor Author

ryoqun commented Oct 17, 2020

@carllin I think this pr is ready for another round of review. :)

@ryoqun
Copy link
Contributor Author

ryoqun commented Oct 19, 2020

#12350 (comment)

@carllin oh, btw, NoPropagatedConfirmation can be worked around just by not staked votes by non-existing vote account. So, any validator can be tricked into the false illusion of propagation. And attacker can do this without fear of slashing, maybe? This might be actually be a bug? Should we exclude 0-lamport votes and introduce some threshold like 1 (or N) lamport(s) or 0.01% stake?

we'll take a look at this later. I'll first merge this as this lgtm-ed by @carllin via discord.

@ryoqun ryoqun merged commit 54517ea into solana-labs:master Oct 19, 2020
mergify bot pushed a commit that referenced this pull request Oct 19, 2020
* Follow up to persistent tower

* Ignore for now...

* Hard-code validator identities for easy reasoning

* Add a test for opt. conf violation without tower

* Fix compile with rust < 1.47

* Remove unused method

* More move of assert tweak to the asser pr

* Add comments

* Clean up

* Clean the test addressing various review comments

* Clean up a bit

(cherry picked from commit 54517ea)
mergify bot added a commit that referenced this pull request Oct 19, 2020
…12972)

* Follow up to persistent tower

* Ignore for now...

* Hard-code validator identities for easy reasoning

* Add a test for opt. conf violation without tower

* Fix compile with rust < 1.47

* Remove unused method

* More move of assert tweak to the asser pr

* Add comments

* Clean up

* Clean the test addressing various review comments

* Clean up a bit

(cherry picked from commit 54517ea)

Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
}

fn purge_slots(blockstore: &Blockstore, start_slot: Slot, slot_count: Slot) {
blockstore.purge_from_next_slots(start_slot, start_slot + slot_count);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the correct order (cf: #13065 )

CriesofCarrots pushed a commit to CriesofCarrots/solana that referenced this pull request Dec 4, 2020
…s#12350)

* Follow up to persistent tower

* Ignore for now...

* Hard-code validator identities for easy reasoning

* Add a test for opt. conf violation without tower

* Fix compile with rust < 1.47

* Remove unused method

* More move of assert tweak to the asser pr

* Add comments

* Clean up

* Clean the test addressing various review comments

* Clean up a bit
CriesofCarrots pushed a commit to CriesofCarrots/solana that referenced this pull request Dec 4, 2020
…s#12350)

* Follow up to persistent tower

* Ignore for now...

* Hard-code validator identities for easy reasoning

* Add a test for opt. conf violation without tower

* Fix compile with rust < 1.47

* Remove unused method

* More move of assert tweak to the asser pr

* Add comments

* Clean up

* Clean the test addressing various review comments

* Clean up a bit
CriesofCarrots pushed a commit to CriesofCarrots/solana that referenced this pull request Dec 4, 2020
…s#12350)

* Follow up to persistent tower

* Ignore for now...

* Hard-code validator identities for easy reasoning

* Add a test for opt. conf violation without tower

* Fix compile with rust < 1.47

* Remove unused method

* More move of assert tweak to the asser pr

* Add comments

* Clean up

* Clean the test addressing various review comments

* Clean up a bit
Comment on lines +1622 to +1627
// actually saved tower must have at least one vote.
let last_vote = Tower::restore(&ledger_path, &node_pubkey)
.unwrap()
.last_voted_slot()
.unwrap();
Some(last_vote)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @CriesofCarrots noticed (https://discord.com/channels/428295358100013066/560503042458517505/858178548916027402):

Also, why does this method [2] call Tower::restore twice?

this is oversight according to this local commit (fyi, I retain all edits since i cloned the repo locally for accountability...)
The root ancesty of the redundant Tower::restore calls originated from this commit and then evolved significantly, without the bug not being noticed to this day... ;)

$ git show 765ec9c462dbe4e2282426f9230b941dbf5cb8f6
commit 765ec9c462dbe4e2282426f9230b941dbf5cb8f6
Author: Ryo Onodera <ryoqun@gmail.com>
Date:   Thu Oct 8 04:13:46 2020 +0900

    save

diff --git a/local-cluster/tests/local_cluster.rs b/local-cluster/tests/local_cluster.rs
index f07d021614..128e5071c4 100644
--- a/local-cluster/tests/local_cluster.rs
+++ b/local-cluster/tests/local_cluster.rs
@@ -1717,6 +1717,8 @@ fn do_test_optimistic_confirmation_violation_with_or_without_tower(with_tower: b
     let mut bad_vote_detected = false;
     let mut retry = 100;
     loop {
+        let tower = Tower::restore(&val_a_ledger_path, &val_a_7B);
+        if tower.is_err() { continue }
         let s = Tower::restore(&val_a_ledger_path, &val_a_7B).unwrap().last_voted_slot().unwrap();
         if val_c_slots.contains(&s) {
             bad_vote_detected = true;

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants