Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set rent_epoch for updated sysvars to RENT_EXEMPT_EPOCH #3398

Closed
wants to merge 6 commits into from

Conversation

HaoranYi
Copy link

@HaoranYi HaoranYi commented Oct 31, 2024

Problem

When we create new sysvar, we make sure that its balance is rent exempt. However, we don't force its rent_epoch to be RENT_EXEMPT_EPOCH. Instead, we are relying on rent collection code to set rent_epoch.

In the future, after we skip the rewrite (SIMD_0183 ), we will also disable rent collection SIMD_0175. Therefore, we can't rely on the rent collection code to set rent_epoch.

In this PR, we set rent_epoch to RENT_EXEMPT_EPOCH at the time of new sysvar update time.

Note that this won't affect existing sysvars since they have already all have rent_epoch set to u64::MAX. But it is a "must-have" for disable rent collection code.

A quick scan of all sysvars on mainnet shows that all of them are rent exempt and have rent_epoch set to RENT_EXEMPT_EPOCH. This is good news, which means they won't be affected by this change.

sysvar account: SysvarStakeHistory1111111111111111111111111 18446744073709551615 114979200
sysvar account: SysvarRecentB1ockHashes11111111111111111111 18446744073709551615 42706560
sysvar account: SysvarS1otHistory11111111111111111111111111 18446744073709551615 913326000
sysvar account: SysvarS1otHashes111111111111111111111111111 18446744073709551615 143487360
sysvar account: SysvarC1ock11111111111111111111111111111111 18446744073709551615 1169280
sysvar account: SysvarEpochSchedu1e111111111111111111111111 18446744073709551615 1120560
sysvar account: SysvarFees111111111111111111111111111111111 18446744073709551615 946560
sysvar account: SysvarLastRestartS1ot1111111111111111111111 18446744073709551615 946560
sysvar account: SysvarRent111111111111111111111111111111111 18446744073709551615 1009200
sysvar account: SysvarRewards111111111111111111111111111111 18446744073709551615 1002240

However, there is a very small risk for breaking consensus -the behavior of creating sysvar changed. Without the PR, sysvar was created with rent_epoch = 0 and it set to u64:max at its rent scan time. With this PR, sysvar was created with rent_epoch=u64:max. Luckily, there is only one pending feature that introduces new sysvar - partitioned reward feature in v2.1.
If we backported this PR, to v2.1 and wait for the majority of the nodes on the cluster to have this change, before we activate partitioned reward feature, then we will be fine. And all future new sysvars will follow the new pattern with rent_epoch=max at creating time.

Summary of Changes

  • Set syavar rent_epoch at init time
  • Fix tests

Fixes #

@HaoranYi HaoranYi changed the title syavar rent_epoch syavar rent_epoch (wip) Oct 31, 2024
@HaoranYi HaoranYi marked this pull request as draft October 31, 2024 00:52
@HaoranYi HaoranYi changed the title syavar rent_epoch (wip) Set syavar rent_epoch at init Oct 31, 2024
@HaoranYi HaoranYi changed the title Set syavar rent_epoch at init Set syavar rent_epoch at new update time Oct 31, 2024
@HaoranYi HaoranYi marked this pull request as ready for review November 5, 2024 14:28
jeffwashington
jeffwashington previously approved these changes Nov 5, 2024
Copy link

@jeffwashington jeffwashington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good. sysvars are not something I'm super comfortable with.

@jeffwashington
Copy link

does this have to be backported to 2.1 if we hope to activate skip rewrites?

@HaoranYi
Copy link
Author

HaoranYi commented Nov 5, 2024

does this have to be backported to 2.1 if we hope to activate skip rewrites?

Not required for skip rewrites

We are still doing rent scan after skipping rewrites. So we should be fine

However, this is required before we stop rent scan

Thinking about it a bit more. I think we may want to back ported to 2.1. Not for the reason of skip rewrite though. It is for the new sysvar epoch rewards introduced by the partitioned reward feature.

If we don't ported back this change, when partitioned reward feature is acitvated, nodes in 2.1 without this PR, will have the reward sysvar account's rent_epoch =0 at creating time until it is being rent scanned. But nodes, with this PR, will have reward sysvar account's rent_epoch =max at its creating time. Hence, they will diverge ...

runtime/src/bank.rs Outdated Show resolved Hide resolved
dmakarov
dmakarov previously approved these changes Nov 5, 2024
Comment on lines 6700 to 6702
fn adjust_sysvar_rent_epoch(&self, account: &mut AccountSharedData) {
account.set_rent_epoch(solana_sdk::rent_collector::RENT_EXEMPT_RENT_EPOCH);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we not create this function? We only call it from one place, and we don't want to call it anywhere else. Since it is one line, I find the added function indirection inconvenient.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Please also import RENT_EXEMPT_RENT_EPOCH and use it directly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Comment on lines 1944 to 1951
// When new sysvar comes into existence, this code ensures that the
// sysvar's rent_epoch is set to RENT_EXEMPT_EPOCH (i.e. U64::MAX). This
// is because sysvars are never deleted, and are always rent exempt.
// Currently, it relies on rent collection code to update rent_epoch to
// RENT_EXEMPT_EPOCH. In the future, we will remove rent collection
// code. Therefore, we should set rent_epoch here, and don't depend on
// rent collection code to update rent_epoch..
self.adjust_sysvar_rent_epoch(&mut new_account);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens with the sysvars in genesis? I believe they have rent epoch set to 0. So if there are two nodes replaying from genesis—one with this change and one without—could they end up diverging?

Copy link
Author

@HaoranYi HaoranYi Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bad sysvar's rent epoch will be updated at the first writing slot.

Yes. they may diverge.

In fact, if we are replaying from any slot that have sysvar's rent epoch not set to max, we may diverge.

The node, with this PR, the bad sysvar will be update at its first update time. The node without this PR, the bad sysvar will be updated at its rent collection time or first update time, whichever is first. These two times could be different...

Since all sysvar on mnt today have their rent epoch set to max, replying today on mainnet will be fine.

I am not sure if we need to support replaying for the case where we have sysvar rent epoch not set to max.

If we must, I think then we need a feature gate.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we could back port this PR to all existing version of validator that are actively running on the network.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you have any better idea?

Copy link
Author

@HaoranYi HaoranYi Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this change will cause a consensus issue only when we introduce a new sysvar, which is very rare. Currently, the only new pending sysvar is from epoch rewards. If we backported this PR before we activate epoch rewards feature, then I think we should be fine.

Honestly, I think there is very little risk of consensus issue if we are careful about activating the feature that introduce new sysvar... And I don't think we need to introduce a new feature gate for this change. WDYT?

Comment on lines 12375 to 12382
// advance to the next slot to create a new bank, which is cleanable. The
// previous bank is at epoch boundary. An epoch boundary bank is not
// cleanable because epoch rewards sysvar is written in this slot. Before
// #3398, it was cleanable because rent collection will rewrite the epoch
// rewards sysvar due to rent_epoch not setting to u64::MAX. With #3398,
// rent_epoch is set to u64::MAX at creating time, so epoch rewards sysvar
// is not rewritten. Therefore, the slot at epoch boundary is now
// uncleanable. This is why we need to advance to the next slot.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow. The first slot in an epoch cannot be cleaned or should not be cleaned? If it is "cannot", then are there broader implications in AccountsBackgroundService and snapshots too (since there's nothing that prevents clean from running on the first slot in an epoch)? If it is "should not", is that only for this test, or does that impact other code as well?

Copy link
Author

@HaoranYi HaoranYi Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is "should not". And it is only because the way the test is written. More details are as follows.

Without the PR, epoch reward sysvar is created with rent_epoch = 0 at the epoch boundary slot X. And in a late slot Y, when rent collection runs, it finds that the rent_epoch is not MAX, so it write epoch reward sysvar at Y. This will make epoch reward sysvar dead in slot X. So clean can remove slot X.

However, with this PR, epoch reward sysvar is created with rent_epoch = MAX at the epoch boundary slot X. In the late slot Y, when rent collection runs, it find that rent_epoch is already at MAX, so no need to write epoch rewards sysvar at Y. Therefore, the sysvar is still alive at slot X. Clean can't remove slot X.

In this test, the main intent is to test clean to remove a dead slot when all of its account are overwritten in a later slots. Unfortunately, we pick the first slot of the epoch to do the test (a bad choice). Because of the epoch reward sysvar was implicitly created at the first slot of the epoch (which could be live or dead depends on rent collection), and the test doesn't have control over this sysvar, we should move our test of clean one slot later, so that we can just test clean with user accounts that we can control without worrying about the sysvar accounts and rent collection ...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better not to refer to PR numbers in comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. updated the comments.

@HaoranYi HaoranYi dismissed stale reviews from dmakarov and jeffwashington via f5c2877 November 5, 2024 22:09
@HaoranYi
Copy link
Author

HaoranYi commented Nov 6, 2024

adding @jstarry here to review.

@HaoranYi HaoranYi added the v2.1 Backport to v2.1 branch label Nov 6, 2024
Copy link

mergify bot commented Nov 6, 2024

Backports to the beta branch are to be avoided unless absolutely necessary for fixing bugs, security issues, and perf regressions. Changes intended for backport should be structured such that a minimum effective diff can be committed separately from any refactoring, plumbing, cleanup, etc that are not strictly necessary to achieve the goal. Any of the latter should go only into master and ride the normal stabilization schedule. Exceptions include CI/metrics changes, CLI improvements and documentation updates on a case by case basis.

dmakarov
dmakarov previously approved these changes Nov 6, 2024
@jstarry jstarry changed the title Set syavar rent_epoch at new update time Set rent_epoch for updated sysvars to RENT_EXEMPT_EPOCH Nov 6, 2024
@jstarry
Copy link

jstarry commented Nov 6, 2024

So this behavior change means that new sysvars would have their rent epoch set to Epoch::MAX at creation. In the case of the epoch rewards sysvar, once the partitioned rewards feature is activated, we need either all nodes to be running this patch or all nodes to not be running this patch.

Luckily, there is only one pending feature that introduces new sysvar - partitioned reward feature in v2.1.

Partitioned rewards is ready for activation in v2.0 actually

If we backported this PR, to v2.1 and wait for the majority of the nodes on the cluster to have this change, before we activate partitioned reward feature, then we will be fine. And all future new sysvars will follow the new pattern with rent_epoch=max at creating time.

Since partitioned rewards will be enabled very soon on devnet and also likely be enabled mainnet in less than a month once mainnet is on v2.0, I don't think it's a good idea to merge this without a feature gate. There's no urgency in getting this issue patched, I think it can be tied to whatever feature gate we use for solana-foundation/solana-improvement-documents#175

@jstarry
Copy link

jstarry commented Nov 6, 2024

Instead of the approach in this PR, I would prefer a PR that uses a feature switch to update inherit_specially_retained_account_fields from INITIAL_RENT_EPOCH to RENT_EXEMPT_RENT_EPOCH. That way we cover not only new sysvars, but also new builtins and precompiles.

@jstarry
Copy link

jstarry commented Nov 6, 2024

Also just a note that I wrote about changing the initial rent epoch to RENT_EXEMPT_EPOCH for builtins, sysvars, and precompiles here in this SIMD: solana-foundation/solana-improvement-documents#175

@HaoranYi
Copy link
Author

HaoranYi commented Nov 6, 2024

Also just a note that I wrote about changing the initial rent epoch to RENT_EXEMPT_EPOCH for builtins, sysvars, and precompiles here in this SIMD: solana-foundation/solana-improvement-documents#175

Sounds good to me. I hate to introduce another feature gate. I like that we roll this into SIMD175.

@HaoranYi HaoranYi closed this Nov 6, 2024
@HaoranYi
Copy link
Author

HaoranYi commented Nov 6, 2024

We will add this change when we implement feature SIMD175.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v2.1 Backport to v2.1 branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants