-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set rent_epoch
for updated sysvars to RENT_EXEMPT_EPOCH
#3398
Conversation
df84779
to
fcb305a
Compare
73dbd3e
to
16c0927
Compare
16c0927
to
6c37fa2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is good. sysvars are not something I'm super comfortable with.
does this have to be backported to 2.1 if we hope to activate skip rewrites? |
Thinking about it a bit more. I think we may want to back ported to 2.1. Not for the reason of skip rewrite though. It is for the new sysvar epoch rewards introduced by the partitioned reward feature. If we don't ported back this change, when partitioned reward feature is acitvated, nodes in 2.1 without this PR, will have the reward sysvar account's rent_epoch =0 at creating time until it is being rent scanned. But nodes, with this PR, will have reward sysvar account's rent_epoch =max at its creating time. Hence, they will diverge ... |
runtime/src/bank.rs
Outdated
fn adjust_sysvar_rent_epoch(&self, account: &mut AccountSharedData) { | ||
account.set_rent_epoch(solana_sdk::rent_collector::RENT_EXEMPT_RENT_EPOCH); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not create this function? We only call it from one place, and we don't want to call it anywhere else. Since it is one line, I find the added function indirection inconvenient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Please also import RENT_EXEMPT_RENT_EPOCH
and use it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
runtime/src/bank.rs
Outdated
// When new sysvar comes into existence, this code ensures that the | ||
// sysvar's rent_epoch is set to RENT_EXEMPT_EPOCH (i.e. U64::MAX). This | ||
// is because sysvars are never deleted, and are always rent exempt. | ||
// Currently, it relies on rent collection code to update rent_epoch to | ||
// RENT_EXEMPT_EPOCH. In the future, we will remove rent collection | ||
// code. Therefore, we should set rent_epoch here, and don't depend on | ||
// rent collection code to update rent_epoch.. | ||
self.adjust_sysvar_rent_epoch(&mut new_account); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with the sysvars in genesis? I believe they have rent epoch set to 0. So if there are two nodes replaying from genesis—one with this change and one without—could they end up diverging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bad sysvar's rent epoch will be updated at the first writing slot.
Yes. they may diverge.
In fact, if we are replaying from any slot that have sysvar's rent epoch not set to max, we may diverge.
The node, with this PR, the bad sysvar will be update at its first update time. The node without this PR, the bad sysvar will be updated at its rent collection time or first update time, whichever is first. These two times could be different...
Since all sysvar on mnt today have their rent epoch set to max, replying today on mainnet will be fine.
I am not sure if we need to support replaying for the case where we have sysvar rent epoch not set to max.
If we must, I think then we need a feature gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or we could back port this PR to all existing version of validator that are actively running on the network.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you have any better idea?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this change will cause a consensus issue only when we introduce a new sysvar, which is very rare. Currently, the only new pending sysvar is from epoch rewards. If we backported this PR before we activate epoch rewards feature, then I think we should be fine.
Honestly, I think there is very little risk of consensus issue if we are careful about activating the feature that introduce new sysvar... And I don't think we need to introduce a new feature gate for this change. WDYT?
runtime/src/bank/tests.rs
Outdated
// advance to the next slot to create a new bank, which is cleanable. The | ||
// previous bank is at epoch boundary. An epoch boundary bank is not | ||
// cleanable because epoch rewards sysvar is written in this slot. Before | ||
// #3398, it was cleanable because rent collection will rewrite the epoch | ||
// rewards sysvar due to rent_epoch not setting to u64::MAX. With #3398, | ||
// rent_epoch is set to u64::MAX at creating time, so epoch rewards sysvar | ||
// is not rewritten. Therefore, the slot at epoch boundary is now | ||
// uncleanable. This is why we need to advance to the next slot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow. The first slot in an epoch cannot be cleaned or should not be cleaned? If it is "cannot", then are there broader implications in AccountsBackgroundService and snapshots too (since there's nothing that prevents clean
from running on the first slot in an epoch)? If it is "should not", is that only for this test, or does that impact other code as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is "should not". And it is only because the way the test is written. More details are as follows.
Without the PR, epoch reward sysvar is created with rent_epoch = 0 at the epoch boundary slot X. And in a late slot Y, when rent collection runs, it finds that the rent_epoch is not MAX, so it write epoch reward sysvar at Y. This will make epoch reward sysvar dead in slot X. So clean can remove slot X.
However, with this PR, epoch reward sysvar is created with rent_epoch = MAX at the epoch boundary slot X. In the late slot Y, when rent collection runs, it find that rent_epoch is already at MAX, so no need to write epoch rewards sysvar at Y. Therefore, the sysvar is still alive at slot X. Clean can't remove slot X.
In this test, the main intent is to test clean to remove a dead slot when all of its account are overwritten in a later slots. Unfortunately, we pick the first slot of the epoch to do the test (a bad choice). Because of the epoch reward sysvar was implicitly created at the first slot of the epoch (which could be live or dead depends on rent collection), and the test doesn't have control over this sysvar, we should move our test of clean one slot later, so that we can just test clean with user accounts that we can control without worrying about the sysvar accounts and rent collection ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better not to refer to PR numbers in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. updated the comments.
f5c2877
f5c2877
to
b956c4d
Compare
adding @jstarry here to review. |
Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis. |
ce36cf6
to
b5177a1
Compare
b5177a1
to
7ac4965
Compare
rent_epoch
for updated sysvars to RENT_EXEMPT_EPOCH
So this behavior change means that new sysvars would have their rent epoch set to
Partitioned rewards is ready for activation in v2.0 actually
Since partitioned rewards will be enabled very soon on devnet and also likely be enabled mainnet in less than a month once mainnet is on v2.0, I don't think it's a good idea to merge this without a feature gate. There's no urgency in getting this issue patched, I think it can be tied to whatever feature gate we use for solana-foundation/solana-improvement-documents#175 |
Instead of the approach in this PR, I would prefer a PR that uses a feature switch to update |
Also just a note that I wrote about changing the initial rent epoch to |
Sounds good to me. I hate to introduce another feature gate. I like that we roll this into SIMD175. |
We will add this change when we implement feature SIMD175. |
Problem
When we create new sysvar, we make sure that its balance is rent exempt. However, we don't force its rent_epoch to be
RENT_EXEMPT_EPOCH
. Instead, we are relying on rent collection code to set rent_epoch.In the future, after we skip the rewrite (SIMD_0183 ), we will also disable rent collection SIMD_0175. Therefore, we can't rely on the rent collection code to set
rent_epoch
.In this PR, we set
rent_epoch
toRENT_EXEMPT_EPOCH
at the time of new sysvar update time.Note that this won't affect existing sysvars since they have already all have rent_epoch set to u64::MAX. But it is a "must-have" for disable rent collection code.
A quick scan of all sysvars on mainnet shows that all of them are rent exempt and have
rent_epoch
set toRENT_EXEMPT_EPOCH
. This is good news, which means they won't be affected by this change.However, there is a very small risk for breaking consensus -the behavior of creating sysvar changed. Without the PR, sysvar was created with rent_epoch = 0 and it set to u64:max at its rent scan time. With this PR, sysvar was created with rent_epoch=u64:max. Luckily, there is only one pending feature that introduces new sysvar - partitioned reward feature in v2.1.
If we backported this PR, to v2.1 and wait for the majority of the nodes on the cluster to have this change, before we activate partitioned reward feature, then we will be fine. And all future new sysvars will follow the new pattern with rent_epoch=max at creating time.
Summary of Changes
Fixes #