Make Cruise Control resilient against bad metadata due to recreation of same topic #1726

efeg · 2021-11-02T22:45:06Z

Summary: In clusters, where a Kafka topic is deleted and then recreated with the same name, it is possible for Cruise Control (CC) to be stuck with a stale version of Kafka metadata. Due this staleness, CC might be unable to see a subset of such newly created topics in the cluster. As a result, (1) kafka_cluster_state endpoint may return a response that misses a subset of partitions in the cluster, (2) any CC operation that requires generating proposals (e.g. rebalance, remove_broker) might miss some partitions -- e.g. remove_broker might fail to drain all replicas from the removed broker due to lack of information about them in CC's metadata.

Short-term mitigation: Bouncing CC instance forces it to refresh its cached metadata. If users encounter a case, where the metadata is stale, that is the fastest short-term mitigation.

Details: Enabled trace-level logs on CC to see the content of received metadata in a case, where we suspected metadata staleness. In this case, a topic that existed in the cluster was deleted, then recreated with the same name. CC was unable to show the partitions of that topic in the verbose response of kafka_cluster_state.
The content of metadata showed that the topic information was indeed available in the response received from the broker – i.e.

{error_code=0,name=DeletedAndThenRecreatedTopic,is_internal=false,partitions=[{error_code=0,partition_index=0,leader_id=<someBrokerId>,leader_epoch=6,replica_nodes=[someBrokerIds],isr_nodes=[someBrokerIds],offline_replicas=[],_tagged_fields={}}],topic_authorized_operations=0,_tagged_fields={}}

However, CC logs show that the underlying metadata cache ignores this partition because its leader epoch is less than the local cached leader epoch:

TRACE [Metadata] [LoadMonitorExecutor-1] [kafka-cruise-control] [] Determining if we should replace existing epoch 9 with new epoch 6
DEBUG [Metadata] [LoadMonitorExecutor-1] [kafka-cruise-control] [] Not replacing existing epoch 9 with new epoch 6 for partition DeletedAndThenRecreatedTopic-0

The motivation behind this leader epoch is to avoid updating local metadata cache with stale metadata information for partitions. However, in this case, due to an earlier larger epoch of the previously existing topic partition, client cache fails to be updated unless the epoch of the new topic partition DeletedAndThenRecreatedTopic-0 eventually grows above the local cached epoch – i.e. Kafka resets the leader epoch of partition after its deletion, but since the locally cached leader epoch in CC's metadata client is unaware of that deletion, it continues to ignore the update for the new partition.

Note that this issue can only happen for partitions from topics that have been deleted and recreated with the same name in a succession.

Relevant: #1708

--
Note that the same issue exists in regular Kafka consumers (We reproduced the same issue as reported here: https://issues.apache.org/jira/browse/KAFKA-12257)

The text was updated successfully, but these errors were encountered:

lenin-joseph · 2024-05-10T10:07:39Z

Hi @efeg, Is there any fix for this or any ETA? Thank you!

efeg added correctness A condition affecting the proper functionality. robustness Makes the project tolerate or handle perturbations. labels Nov 2, 2021

efeg self-assigned this Nov 2, 2021

efeg mentioned this issue Nov 2, 2021

cruise control can't recognize topics which are newly created during the execution of remove_broker? #1708

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Cruise Control resilient against bad metadata due to recreation of same topic #1726

Make Cruise Control resilient against bad metadata due to recreation of same topic #1726

efeg commented Nov 2, 2021

lenin-joseph commented May 10, 2024

Make Cruise Control resilient against bad metadata due to recreation of same topic #1726

Make Cruise Control resilient against bad metadata due to recreation of same topic #1726

Comments

efeg commented Nov 2, 2021

lenin-joseph commented May 10, 2024