-
Notifications
You must be signed in to change notification settings - Fork 111
Prioritise leadership changes in cc #3955
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
jackkleeman
commented
Nov 6, 2025
- On cc startup, bulk load pp epoch state up front (was previously a todo)
- Make leadership selections based on current membership for all partitions before moving on to making membership reconfigurations. This allows us to fail over leadership without doing any metadata operations
| .try_filter_map( | ||
| async |(partition_id, partition_state)| match partition_state { | ||
| Some(partition_state) => { | ||
| Self::note_observed_membership_update( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unsure if this is needed. it would be set when inserting/loading initial config, even if that initial config hasnt changed from what was already in metadata. in this code path we are only loading whats in metadata and not doing any mutation. my thinking is it doesn't hurt to update the observed membership.
| // instructing a new leader when we already have the metadata requires no new metadata operations and can be done nearly instantly | ||
| // by comparison, ensure_valid_partition_configuration can take (metadata operation latency * 2 * affected partitions) | ||
| // which might be several seconds, and leader instruction would only happen at the end. | ||
| self.ensure_valid_leaders(cluster_state, legacy_cluster_state, partition_table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer happening after ensure_valid_partition_configuration, which means that we have a less recent load of the epoch metadata to base our membership decision off of. the benefit of that is that we don't need any metadata reads to react to the leader going down.
previously the leader selection would be using a load from the same on_cluster_state_change iteration, whereas now we are using a load from the previous iteration (unless this is the first iteration). however in either case, we can have outdated metadata if someone other than us is mutating it, and i guess the leadership instruct will just be ignored in that case.
917ea1c to
42ead35
Compare
42ead35 to
7d930db
Compare