Skip to content

Consumer getting stuck with compacted topic #1701

Closed
@jgardner3

Description

@jgardner3

We're seeing the consumer hang before reaching the end of all of the topic partitions. It varies which partitions it hangs on. After writing to the partitions, some will work and different ones will hang.
We saw the following in the debug logs which might be relevant:

16:18:27 | DEBUG Adding fetch request for partition TopicPartition(topic='...', partition=38) at offset 706
16:18:27 | DEBUG Sending FetchRequest to node 132
16:18:27 | DEBUG Sending request FetchRequest_v4(replica_id=-1, max_wait_time=500, min_bytes=1, max_bytes=52428800, isolation_level=0, topics=[(topic='...', partitions=[(partition=38, offset=706, max_bytes=1048576)])])
16:18:27 | DEBUG <BrokerConnection node_id=132 host=...:9092 <connected> [IPv4 ('...', 9092)]> Request 370: FetchRequest_v4(replica_id=-1, max_wait_time=500, min_bytes=1, max_bytes=52428800, isolation_level=0, topics=[(topic='...', partitions=[(partition=38, offset=706, max_bytes=1048576)])])
16:18:27 | DEBUG Received correlation id: 370
16:18:27 | DEBUG Processing response FetchResponse_v4
16:18:27 | DEBUG <BrokerConnection node_id=132 host=...:9092 <connected> [IPv4 ('...', 9092)]> Response 370 (150.505065918 ms): FetchResponse_v4(throttle_time_ms=0, topics=[(topics=u'...', partitions=[(partition=38, error_code=0, highwater_offset=1171, last_stable_offset=-1, aborted_transactions=NULL, message_set='\x00\x00\x00\x00\x00\x00\x02\xc1\x00\x00\x11Z\
16:18:27 | DEBUG Adding fetched record for partition TopicPartition(topic=u'...', partition=38) with offset 706 to buffered record list
16:18:27 | DEBUG Skipping message offset: 705 (expecting 706)

Our kafka-python version is 1.4.4 and our broker is version 1.0.1. The topic is configured with:

partitions: 124
replication-factor: 6
segment.bytes=1048576
min.cleanable.dirty.ratio=.01
delete.retention.ms=5000
retention.ms=86400000
cleanup.policy=compact

We've experimented with setting various config options for the consumer as well, particularly max_partition_fetch_bytes setting it up to 1024*1024*1024, however, it still would get stuck in the same offset for the same partitions.

We think this may be related to these previous issues:

We saw something similar in the Ruby-Kafka project as well:

Let me know if there is anything we can do to help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions