Description
We're seeing the consumer hang before reaching the end of all of the topic partitions. It varies which partitions it hangs on. After writing to the partitions, some will work and different ones will hang.
We saw the following in the debug logs which might be relevant:
16:18:27 | DEBUG Adding fetch request for partition TopicPartition(topic='...', partition=38) at offset 706
16:18:27 | DEBUG Sending FetchRequest to node 132
16:18:27 | DEBUG Sending request FetchRequest_v4(replica_id=-1, max_wait_time=500, min_bytes=1, max_bytes=52428800, isolation_level=0, topics=[(topic='...', partitions=[(partition=38, offset=706, max_bytes=1048576)])])
16:18:27 | DEBUG <BrokerConnection node_id=132 host=...:9092 <connected> [IPv4 ('...', 9092)]> Request 370: FetchRequest_v4(replica_id=-1, max_wait_time=500, min_bytes=1, max_bytes=52428800, isolation_level=0, topics=[(topic='...', partitions=[(partition=38, offset=706, max_bytes=1048576)])])
16:18:27 | DEBUG Received correlation id: 370
16:18:27 | DEBUG Processing response FetchResponse_v4
16:18:27 | DEBUG <BrokerConnection node_id=132 host=...:9092 <connected> [IPv4 ('...', 9092)]> Response 370 (150.505065918 ms): FetchResponse_v4(throttle_time_ms=0, topics=[(topics=u'...', partitions=[(partition=38, error_code=0, highwater_offset=1171, last_stable_offset=-1, aborted_transactions=NULL, message_set='\x00\x00\x00\x00\x00\x00\x02\xc1\x00\x00\x11Z\
16:18:27 | DEBUG Adding fetched record for partition TopicPartition(topic=u'...', partition=38) with offset 706 to buffered record list
16:18:27 | DEBUG Skipping message offset: 705 (expecting 706)
Our kafka-python version is 1.4.4 and our broker is version 1.0.1. The topic is configured with:
partitions: 124
replication-factor: 6
segment.bytes=1048576
min.cleanable.dirty.ratio=.01
delete.retention.ms=5000
retention.ms=86400000
cleanup.policy=compact
We've experimented with setting various config options for the consumer as well, particularly max_partition_fetch_bytes
setting it up to 1024*1024*1024
, however, it still would get stuck in the same offset for the same partitions.
We think this may be related to these previous issues:
- Can't consume from partition if current offset has been removed due to compaction #1390
- Kafka consumer stalling when consuming compacted topic #1571
We saw something similar in the Ruby-Kafka project as well:
Let me know if there is anything we can do to help.