Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only fetch for partitions with initialized offsets #582

Merged
merged 2 commits into from
Dec 9, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 26 additions & 5 deletions src/consumer/consumerGroup.js
Original file line number Diff line number Diff line change
Expand Up @@ -368,13 +368,34 @@ module.exports = class ConsumerGroup {
)

const leaders = keys(partitionsPerLeader)
const committedOffsets = this.offsetManager.committedOffsets()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we're using committedOffsets here, instead of just resolved? An invalid offset could come from attempting to resume from a committed offset, but also from a consumer.seek. Your great comment touches on how both are cleared, so I'm not sure whether there would be any actual difference in behaviour, but as future changes are made this subtle difference might be harder to spot while becoming more consequential.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I was actually using resolvedOffsets, but that didn't work because OffsetManager.resolveOffsets actually just sets the initialized consumer offsets in committedOffsets, not in resolvedOffsets (did someone mention that our naming is confusing...? 😅). When the consumer first boots, it doesn't actually have any resolved offsets, so the only source of offsets is the initialized offsets in committedOffsets.

Regarding the seek behavior, I would expect it to work the same way, no? Seek would commit (potentially invalid) offsets and then clear both committedOffsets and resolvedOffsets using OffsetManager.clearOffsets. In the fetch loop we'd get the consumer offsets from the brokers and from there on it's the same.

Maybe I'm missing something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the seek behavior, I would expect it to work the same way, no? Seek would commit (potentially invalid) offsets

Seeking shouldn't commit, only move the "playhead", see #395, so to rely on that behaviour is probably not the thing we want.

Initially I was actually using resolvedOffsets, but that didn't work because OffsetManager.resolveOffsets actually just sets the initialized consumer offsets in committedOffsets, not in resolvedOffsets (did someone mention that our naming is confusing...? 😅).

I guess having to use committedOffsets is a symptom of there being an issue in there then. Conceptually, it's the resolvedOffsets (which I understand is the "next to consume offset" or "playhead" for reading the log) that should always exist and the committed offset which is optional, as using Kafka for committing offsets is / should be totally optional (see #395).

Since that seems like a different issue, maybe it's an idea we create a separate issue for it and tag that in a comment. Being able to spot outside of the context of these changes that we conceptually want the resolved offsets rather than committed there might be a lot to ask from our future selves (or others) 😅.

Copy link
Collaborator Author

@Nevon Nevon Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since that seems like a different issue, maybe it's an idea we create a separate issue for it and tag that in a comment.

That sounds like a good idea. I would prefer to do that kind of holistic refactoring in a PR that doesn't actually change any behavior, rather than squeezing it into a bugfix. Could you create that issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created the issue, trying to preserve the context of this conversation properly: #585. To help the audit suggested in there, I'd suggest linking that issue in a comment above where committedOffsets() is called.


for (const leader of leaders) {
const partitions = partitionsPerLeader[leader].map(partition => ({
partition,
fetchOffset: this.offsetManager.nextOffset(topicPartition.topic, partition).toString(),
maxBytes: maxBytesPerPartition,
}))
const partitions = partitionsPerLeader[leader]
.filter(partition => {
/**
* When recovering from OffsetOutOfRange, each partition can recover
* concurrently, which invalidates resolved and committed offsets as part
* of the recovery mechanism (see OffsetManager.clearOffsets). In concurrent
* scenarios this can initiate a new fetch with invalid offsets.
*
* This was further highlighted by https://github.com/tulios/kafkajs/pull/570,
* which increased concurrency, making this more likely to happen.
*
* This is solved by only making requests for partitions with initialized offsets.
*
* See the following pull request which explains the context of the problem:
* @issue https://github.com/tulios/kafkajs/pull/578
*/
return committedOffsets[topicPartition.topic][partition] != null
})
.map(partition => ({
partition,
fetchOffset: this.offsetManager
.nextOffset(topicPartition.topic, partition)
.toString(),
maxBytes: maxBytesPerPartition,
}))

requestsPerLeader[leader] = requestsPerLeader[leader] || []
requestsPerLeader[leader].push({ topic: topicPartition.topic, partitions })
Expand Down