Skip to content

Telegraf stops konsuming from partition on GetOffset error, does not try again (affects entire consumer group) #3553

Closed
@HristoMohamed

Description

Telegraf version telegraf-1.4.5-1.x86_64 running on 3.10.0-327.36.3.el7.x86_64, Kafka is 1.0.0, Scala for Kafka is 2.12.

Everything runs fine, until:

Dec 07 18:15:53 telecons02 telegraf[2820]: 2017-12-07T17:15:53Z E! Error in plugin [inputs.kafka_consumer]: Consumer Error: kafka: error while consuming telegrafcommon/26: kafka server: The requested offset is outside the range of offsets maintained by the server for the given topic/partition.

After this, the specified partition is not consumed anymore and messages pile up.
This is fixable by restating telegraf.

Interesting fact is that I have a few telegraf instances consuming kafka messages and all of them hit this issue on a few partitions (random partitions, cannot localize it to one partition). When I restart one telegraf instance the entire consumer group goes back to normal and messages are flowing (even on partitions served by the other instances that were stuck).

Pls help ;(

Metadata

Assignees

No one assigned

    Labels

    area/kafkabugunexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions