-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lease Checkpoints fix #13508
Lease Checkpoints fix #13508
Conversation
Current checkpointing mechanism is buggy. New checkpoints for any lease are scheduled only until the first leader change. Added fix for that and a test that will check it.
To extend lease checkpointing mechanism to cases when the whole etcd cluster is restarted.
79a4d26
to
fd77b27
Compare
cc @ptabor |
cc @jpbetz |
Can this cause inconsistency of checksums of the DBs computed from diffrent instances, |
b4cf39e
to
164fa1a
Compare
I have added a proper feature guarding to ensure that we avoid checksum inconsistency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of persistTo and scheduling checkpoints on promotion look right to me. Thanks for fixing. One comment about flag docs.
5253751
to
94a3ec9
Compare
…lease-checkpoint-persist to persist lease remainingTTL To avoid inconsistant behavior during cluster upgrade we are feature gating persistance behind cluster version. This should ensure that all cluster members are upgraded to v3.6 before changing behavior. To allow backporting this fix to v3.5 we are also introducing flag --experimental-enable-lease-checkpoint-persist that will allow for smooth upgrade in v3.5 clusters with this feature enabled.
94a3ec9
to
f815d75
Compare
Addressed all the comments, @ptabor please take another look. |
f815d75
to
3425dc0
Compare
3425dc0
to
48a7aab
Compare
@serathius Do you have bandwidth to backport this PR to 3.4? :) |
Done |
This is continuation of work on #13491
Main difference is that this proposes to persist remainingTTL on checkpoint making checkpoints not depend on snapshot frequency.
I also check that renew logic should not be impacted by this change. Renew already includes logic that will reset remainingTTL if needed.