-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Switch programs activation to whole-set based gating #11750
Conversation
@ryoqun I have some concerns with these changes, at every bank we are checking for the existence of the account and calling message_processor to override the program's Maybe genesis programs can have a list of Also, is there any reason not to make the hot fix permanent, seems the bank is the best place for that logic? And, any reason to keep inflation in genesis-programs? Maybe we should also move that to runtime like the builtin programs are in |
Codecov Report
@@ Coverage Diff @@
## master #11750 +/- ##
========================================
Coverage 82.0% 82.0%
========================================
Files 330 330
Lines 77174 77464 +290
========================================
+ Hits 63334 63589 +255
- Misses 13840 13875 +35 |
runtime/src/bank.rs
Outdated
if parent.epoch() < new.epoch() { | ||
new.refresh_programs_and_inflation(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
I don't think we're doing at every bank. I think we're doing the checks only when crossing the epoch and when restoring from snapshots: https://github.com/solana-labs/solana/pull/11750/files#r474369115 and notice that |
Yeah, I'll move it as well. it's a bit odd for |
I think this approach will work too. But I originally avoided to do so because it'd make the activation mechanism a bit less flexible. On the other hand, this would lift the idempotent requirement which is currently required under Also, note that I had this in mind: the most important objective here is make it super-simple to do add/tweak gating logic to avoid similar bugs in the foreseeable future. FYI, we're doing the current checks at epoch boundaries already: #11750 (comment) |
Yeah, sorry I didn't mean every bank, every epoch, but at every epoch we are re-adding these programs, kinda weird |
Can you elaborate on how the |
Yeah, I want to avoid the weird behavior as well, too. I originally thought that the behavior wouldn't occur because I wonder why there is subtle behavior differences between the two similar functions. I'll just align
It's less flexible in that you can't specify arbitrary condition like with closures. So, say we can't conditionally trigger new programs with Anyway, thanks for the inputs. Because now I have some more time, I'll put more changes to fix them more properly. |
runtime/src/bank.rs
Outdated
*self.entered_epoch_callback.write().unwrap() = Some(entered_epoch_callback); | ||
self.apply_feature_activations(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this is needed in addition to call at finish_init
.
new.update_epoch_stakes(leader_schedule_epoch); | ||
new.ancestors.insert(new.slot(), 0); | ||
new.parents().iter().enumerate().for_each(|(i, p)| { | ||
new.ancestors.insert(p.slot(), i + 1); | ||
}); | ||
if parent.epoch() < new.epoch() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is moved because of use of get_account
inside there.
@@ -175,6 +175,20 @@ pub type ProcessInstruction = fn(&Pubkey, &[KeyedAccount], &[u8]) -> Result<(), | |||
pub type ProcessInstructionWithContext = | |||
fn(&Pubkey, &[KeyedAccount], &[u8], &mut dyn InvokeContext) -> Result<(), InstructionError>; | |||
|
|||
// These are just type aliases for work around of Debug-ing above function pointers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean these are temporary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I hope these are temporary. But, it'll take some time (a year?) considering the duration the upstream bug has been opening: rust-lang/rust#50280
If these are concerning, I'll try another workaround or remove workaround at all. I just want to dbg!
some types containing these for quick debugging. This work-around isn't a must; just nice-to-have.
Well, the current (EDIT: base commit of this pr) code doesn't populate master when booted from genesis (without snapshot) => 2 bpf
master when booted from snapshot (with snapshot) => no
Also, it seems that the |
We need more snapshot tests ;-) |
@jackcmay As for backporting this, if our ultimate goal is to land bpf_loader2 on v1.2, I think we're better off back-porting the prs first, not this one. I just noticed the amount of bpf2 is rather large. Back-porting in git-reversed order (= start to backport this pr now after merging) would be quite pain for this size of changes for me and you. ;) If we can't backport individually those relevant prs due to known bug (maybe this?), we can create a pr against v1.2 and cherry-pick them in the git-chronological order, which will lessen the degree of merge conflicts. |
Also, I'll postpone this as a separate pr later. Inflation doesn't change on v1.2. And this will just make this pr (already moderate sized) bigger for no good reason. |
self.recheck_cross_program_support(); | ||
} | ||
|
||
fn recheck_cross_program_support(self: &mut Bank) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why recheck
and not just check
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not strong reason; just to convey the idempotent connotation. If this sounds odd, I'm completely fine with fn check_...
.
Ugh, I think I've found dos secrutiy... We aren't considering
|
Anyway, I think we need sysvar-like treatment for existing accounts. |
I'm guessing now |
@ryoqun Can you elaborate more on empty loaders issue you are seeing? |
There are some cases where For v1.3, I haven't tested. For HEAD of master at the moment (ie without this pr), With this pr, All, can be tested with |
oh conflict... |
Ok, tests are in place, finally! I'll address this. |
04d8b80
to
5699303
Compare
if let Some(mut account) = self.get_account(&program_id) { | ||
already_genuine_program_exists = native_loader::check_id(&account.owner); | ||
|
||
if !already_genuine_program_exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bring on the fire and brimstone.... what if this account has a very high balance and is obviously worth a lot to someone. Seems like instead or along with we should block known addresses from being added in the first place?
Also, what if someone adds an account and assigns it to the native_loader?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bring on the fire and brimstone....
Hehe, I like thinking hard for corner cases... :)
Also, what if someone adds an account and assigns it to the native_loader?
Yeah, I thought of it. Well, can someone assign accounts to non-existing owner program_id (=NativeLoader1111111111111111111111111111111
)?:
ryoqun@ubuqun:~/work/solana/solana$ ./target/release/solana --url http://api.mainnet-beta.solana.com account BPFLoader1111111111111111111111111111111111
Public Key: BPFLoader1111111111111111111111111111111111
Balance: 0.000000001 SOL
Owner: NativeLoader1111111111111111111111111111111
Executable: true
Rent Epoch: 0
Length: 25 (0x19) bytes
0000: 73 6f 6c 61 6e 61 5f 62 70 66 5f 6c 6f 61 64 65 solana_bpf_loade
0010: 72 5f 70 72 6f 67 72 61 6d r_program
ryoqun@ubuqun:~/work/solana/solana$ ./target/release/solana --url http://api.mainnet-beta.solana.com account NativeLoader1111111111111111111111111111111
Error: AccountNotFound: pubkey=NativeLoader1111111111111111111111111111111
I just blindly thought the original code was covering the case that with mere this condition: native_loader::check_id(&account.owner)
.
If someone can do the wild, we can add account.executable
condition to discern legitimate native loader program from the spoofed native loader program?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if this account has a very high balance and is obviously worth a lot to someone. Seems like instead or along with we should block known addresses from being added in the first place?
;) Yeah, this could be quite a draconian. But, just to be sure, is this really protecting someone from misfortune? Well, people can give lump of sols into wrong account with 1 letter typo... Also, people can send to any of sysvars...
Anyway, I think we should need a preparatory pr to prohibit money transfer to & from some pretty long prefix or suffix? Considering new sysvar or new bpf loaders are rare, this might be a too much. Also, finding nice base58-friendly prefix/postfix seems hard....
Or, I'm also thinking to practice of 2-phased introduction of sysvars/loaders like this:
1(a). Release a patch release which just prohibits any money transfer to certain account at certain epoch.
1(b). Or, just create an account with Placeholder11111 with the executable bit? (might be easiest?)
1(c). Add BPFLoader11111 to frozen account.... (no, this is bad, dos vector)
2. release the acutual meat
self.store_account(program_id, &account); | ||
let mut already_genuine_program_exists = false; | ||
if let Some(mut account) = self.get_account(&program_id) { | ||
already_genuine_program_exists = native_loader::check_id(&account.owner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does genuine mean here, could just be is_native
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call. :) Not strongly-opinionated word choice, I wanted to emphasize that an native-loader's accounts can be good or bad. And this is really trying to indicate this variable holds a flag for the good native loader program's account.
I just picked this word from these opposing word pools:
real, legitimate, genuine <=> spoofed/crafted/malicious
@jackcmay Thanks for continuous review! I think I've addressed all of review comments. Could you finish off reviewing this so that I can merge this? :) |
This is done: ebba5bd |
Besides that one last comment about the bank config, this looks great, thanks @ryoqun ! |
@jackcmay I've just merged this after somewhat extensive local-testing! |
* Implement Debug for MessageProcessor * Switch from delta-based gating to whole-set gating * Remove dbg! * Fix clippy * Clippy * Add test * add loader to stable operating mode at proper epoch * refresh_programs_and_inflation after ancestor setup * Callback via snapshot; avoid account re-add; Debug * Fix test * Fix test and fix the past history * Make callback management stricter and cleaner * Fix test * Test overwrite and frozen for native programs * Test epoch callback with genesis-programs * Add assertions for parent bank * Add tests and some minor cleaning * Remove unsteady assertion... * Fix test... * Fix DOS * Skip ensuring account by dual (whole/delta) gating * Fix frozen abi implementation... * Move compute budget constatnt init back into bank Co-authored-by: Ryo Onodera <ryoqun@gmail.com> (cherry picked from commit db4bbb3) # Conflicts: # genesis-programs/src/lib.rs
) * Switch programs activation to whole-set based gating (#11750) * Implement Debug for MessageProcessor * Switch from delta-based gating to whole-set gating * Remove dbg! * Fix clippy * Clippy * Add test * add loader to stable operating mode at proper epoch * refresh_programs_and_inflation after ancestor setup * Callback via snapshot; avoid account re-add; Debug * Fix test * Fix test and fix the past history * Make callback management stricter and cleaner * Fix test * Test overwrite and frozen for native programs * Test epoch callback with genesis-programs * Add assertions for parent bank * Add tests and some minor cleaning * Remove unsteady assertion... * Fix test... * Fix DOS * Skip ensuring account by dual (whole/delta) gating * Fix frozen abi implementation... * Move compute budget constatnt init back into bank Co-authored-by: Ryo Onodera <ryoqun@gmail.com> (cherry picked from commit db4bbb3) # Conflicts: # genesis-programs/src/lib.rs * Fix conflicts Co-authored-by: Jack May <jack@solana.com> Co-authored-by: Ryo Onodera <ryoqun@gmail.com>
|
||
bank.add_builtin_program("mock_program1", vote_id, mock_ix_processor); | ||
bank.add_builtin_program("mock_program2", stake_id, mock_ix_processor); | ||
assert!(bank.stakes.read().unwrap().vote_accounts().is_empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well these assertions isn't enough.. https://github.com/solana-labs/solana/pull/13884/files#r535949045
Problem
get_entered_epoch_callback
isn't called when restoring from snapshots, which is too confusing and error-prone.Also, we can't simply call it immediately after snapshot restoration because it expects to be called exactly once at each epoch boundary
get_programs
andget_builtins
returns delta set. i.e. add these new additions of programs to the current available set at the given epoch. This works nicely in the ideal world, where we're running the validator since genesis without ever restarting a perfect bug-free validator.In reality, we must rely on snapshots 99.999% of time. When restoring from snapshots, the delta set doesn't work quite: we don't persist the current available set (namely
bank.message_processor
is effectivelyserde(skip)
).Summary of Changes
So, just reflect the reality by making these functions snapshot-friendly by returning whole-set of available programs at the given epoch. And make it callable from
finish_init()
, which is called after snapshot restoration.Also, fix a bunch of other dangerous code along the way.
Also, this is intended to be back-port friendly; so the fix is intentionally not exhaustive. Still,
get_entered_epoch_callback
is a bit error-prone. Specifically, it must be idempotent. (We could solve this by artificially introducing some intermediatestruct
likeScheduledBankFeatures
or the like instead of mind-opening way of passing&mut Bank
).Deprecates #11736