Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Cruise Control resilient against bad metadata due to recreation of same topic #1726

Open
efeg opened this issue Nov 2, 2021 · 1 comment
Assignees
Labels
correctness A condition affecting the proper functionality. robustness Makes the project tolerate or handle perturbations.

Comments

@efeg
Copy link
Collaborator

efeg commented Nov 2, 2021

Summary: In clusters, where a Kafka topic is deleted and then recreated with the same name, it is possible for Cruise Control (CC) to be stuck with a stale version of Kafka metadata. Due this staleness, CC might be unable to see a subset of such newly created topics in the cluster. As a result, (1) kafka_cluster_state endpoint may return a response that misses a subset of partitions in the cluster, (2) any CC operation that requires generating proposals (e.g. rebalance, remove_broker) might miss some partitions -- e.g. remove_broker might fail to drain all replicas from the removed broker due to lack of information about them in CC's metadata.

Short-term mitigation: Bouncing CC instance forces it to refresh its cached metadata. If users encounter a case, where the metadata is stale, that is the fastest short-term mitigation.

Details: Enabled trace-level logs on CC to see the content of received metadata in a case, where we suspected metadata staleness. In this case, a topic that existed in the cluster was deleted, then recreated with the same name. CC was unable to show the partitions of that topic in the verbose response of kafka_cluster_state.
The content of metadata showed that the topic information was indeed available in the response received from the broker – i.e.

{error_code=0,name=DeletedAndThenRecreatedTopic,is_internal=false,partitions=[{error_code=0,partition_index=0,leader_id=<someBrokerId>,leader_epoch=6,replica_nodes=[someBrokerIds],isr_nodes=[someBrokerIds],offline_replicas=[],_tagged_fields={}}],topic_authorized_operations=0,_tagged_fields={}}

However, CC logs show that the underlying metadata cache ignores this partition because its leader epoch is less than the local cached leader epoch:

TRACE [Metadata] [LoadMonitorExecutor-1] [kafka-cruise-control] [] Determining if we should replace existing epoch 9 with new epoch 6
DEBUG [Metadata] [LoadMonitorExecutor-1] [kafka-cruise-control] [] Not replacing existing epoch 9 with new epoch 6 for partition DeletedAndThenRecreatedTopic-0

The motivation behind this leader epoch is to avoid updating local metadata cache with stale metadata information for partitions. However, in this case, due to an earlier larger epoch of the previously existing topic partition, client cache fails to be updated unless the epoch of the new topic partition DeletedAndThenRecreatedTopic-0 eventually grows above the local cached epoch – i.e. Kafka resets the leader epoch of partition after its deletion, but since the locally cached leader epoch in CC's metadata client is unaware of that deletion, it continues to ignore the update for the new partition.

Note that this issue can only happen for partitions from topics that have been deleted and recreated with the same name in a succession.

Relevant: #1708

--
Note that the same issue exists in regular Kafka consumers (We reproduced the same issue as reported here: https://issues.apache.org/jira/browse/KAFKA-12257)

@efeg efeg added correctness A condition affecting the proper functionality. robustness Makes the project tolerate or handle perturbations. labels Nov 2, 2021
@efeg efeg self-assigned this Nov 2, 2021
@lenin-joseph
Copy link

Hi @efeg, Is there any fix for this or any ETA? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
correctness A condition affecting the proper functionality. robustness Makes the project tolerate or handle perturbations.
Projects
None yet
Development

No branches or pull requests

2 participants