Skip to content

Incorrect handling of offset commits after a rebalance #260

Closed
@moreda

Description

@moreda

Hi.

After a rebalance –with topic/partition reassignment– some tasks could try to commit offsets in partitions not assigned to them. This happens when some acknowledgements arrive to a task and the partition is not assigned to it anymore.

In such cases, the task won't be able to commit the offset of that topic/partition, and you'll see messages like:

WARN WorkerSinkTask{id=kafka-connect-splunk-1} Ignoring invalid task provided offset syslog-3/OffsetAndMetadata{offset=117798, leaderEpoch=null, metadata=''} -- partition not assigned, assignment=[syslog-1] (org.apache.kafka.connect.runtime.WorkerSinkTask:422)

It doesn't seem critical, because the task that receives the new partition assignment will try to send again to Splunk HEC and commit eventually. The problem is that the task with the previous assignment won't free memory and it's going to try again and again to commit an offset in an unassigned partition.

A simple way to avoid this situation is to use the info provided by SinkTask.open() and SinkTask.close() to purge and filter events when a partition reassignment occurs. This purge has to be made in buffered events, failed events, offsets, and topic/partition records. This should be considered safe because those events will be picked up by the task that opens the partitions closed in the original task.

Besides this main issue, I've found another issue related to the fact that there's no guarantee that a batch contains events tied to a single topic/partition. In KafkaRecordTracker.removeAckedEventBatch() just the topic/partition of the first event is considered (final Event event = events.get(0);). The consequence of this is that offsets are not committed as soon as they can, but depending on the "randomness" of the first event topic/partition.

To solve all this I'm proposing the PR #259 for review that (I think) solves issue. I extended the tests, I tried it in a small environment and it seems to work as expected. 😄
Cheers,

Rob

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions