Open
Description
Tracer Version(s)
2.21.6
Python Version(s)
3.13.2
Pip Version(s)
pip 25.1.1
Bug Report
Summary
The ddtrace Kafka instrumentation in ddtrace.contrib.internal.kafka.patch._get_cluster_id()
calls list_topics()
without a timeout parameter, which can cause the main thread to block indefinitely when the Kafka cluster becomes unresponsive.
Environment
- ddtrace version: 2.21.6
- Python version: 3.13.2
- Kafka client: confluent-kafka-python
- Installation method: pip
Expected Behavior
Kafka instrumentation should not block the main event loop indefinitely. Operations should have reasonable timeouts to prevent application hangs.
Actual Behavior
When the Kafka cluster becomes unresponsive, the _get_cluster_id()
function
blocks indefinitely on the instance.list_topics(topic=topic)
call at line 332
in /ddtrace/contrib/internal/kafka/patch.py
.
This occurs during every produce()
operation when ddtrace tries to collect the
cluster ID for tracing metadata.
Stack Trace
# Application producer code
producer.produce(topic=topic, value=value, headers=headers, key=key)
# ddtrace/contrib/internal/kafka/patch.py:174 (traced_produce)
cluster_id = _get_cluster_id(instance, topic)
# ddtrace/contrib/internal/kafka/patch.py:332 (_get_cluster_id)
cluster_metadata = instance.list_topics(topic=topic) # <- BLOCKS HERE
Reproduction Steps
- Set up a Kafka producer with ddtrace instrumentation enabled
- Make the Kafka cluster unresponsive (stop broker, introduce network issues,
or misconfigure connection) - Attempt to produce a message using the instrumented producer
- Application hangs indefinitely on the list_topics() call