[Membership] Preserve latest IAmAliveTime across updates #9303
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In subsequent PRs, we leverage the
IAmAliveTime
value for more behavior related to disaster recovery scenarios. Each silo'sIAmAliveTime
value is updated on the membership table directly by that silo and it does not bump the membership table's version. Therefore, we cannot useIAmAliveTime
in any algorithm which requires consistent membership views (eg, monitoring graph construction, directory membership).IAmAliveTime
updates can be occasionally ignored or regressed by membership snapshot updates depending on when the snapshot was captured. That is undesirable but incrementing the membership version for every IAmAliveTiem would greatly increase write contention on the table and cause unnecessary churn for algorithms based on membership version, such as directory membership.This PR improves upon the situation by locally preserving the latest known
IAmAliveTime
values for each silo across membership snapshots. It also allows updating snapshots without incrementing the snapshot version by determining if a snapshot is a logical successor to a previous snapshot (higher version or equal version with at least one greater IAmAliveTime value).The goal is to decrease the chance of incorrectly treating a silo as "stale" (having missed multiple
IAmAliveTime
updates).Microsoft Reviewers: Open in CodeFlow