Skip to content

[SPARK-13569][STREAMING][KAFKA] pattern based topic subscription #14026

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

koeninger
Copy link
Contributor

What changes were proposed in this pull request?

Allow for kafka topic subscriptions based on a regex pattern.

How was this patch tested?

Unit tests, manual tests

@koeninger
Copy link
Contributor Author

@tdas @zsxwing This should be the last ConsumerStrategy implementation to have basic parity with what's offered by the kafka consumer, anything else should probably be handled by user subclasses.

If the KAFKA-3370 workaround stuff isn't clear... the basic issue is that you have to poll in order to get partition assignments before setting a position... but if you poll with auto offset none, it will throw an exception because you don't have a position yet :)

@SparkQA
Copy link

SparkQA commented Jul 2, 2016

Test build #61649 has finished for PR 14026 at commit 796045f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

// silence exception
}
toSeek.asScala.foreach { case (topicPartition, offset) =>
consumer.seek(topicPartition, offset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 chars for indent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Foreach is a scope, case is a nested scope.

@koeninger
Copy link
Contributor Author

ping @tdas @zsxwing these poll fixes need to go in for a release / release candidate, even if SubscribePattern doesn't make it for some reason.

currentOffsets
}
if (!toSeek.isEmpty) {
// work around KAFKA-3370 when reset is none
Copy link
Contributor

@tdas tdas Jul 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you comment to explain the problem in short and the work around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will add comment in the code once I'm back at my workstation.

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61889 has finished for PR 14026 at commit 87b5490.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 7, 2016

Test build #61892 has finished for PR 14026 at commit f287722.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jul 9, 2016

I am going to merge this PR in the interest of RC2. But I would like to have more tests for testing the conditions that led to the poll+seek to be added. These subtle behaviors should not go untested. Would be very good if you open another PR to added more tests.

asfgit pushed a commit that referenced this pull request Jul 9, 2016
## What changes were proposed in this pull request?
Allow for kafka topic subscriptions based on a regex pattern.

## How was this patch tested?
Unit tests, manual tests

Author: cody koeninger <cody@koeninger.org>

Closes #14026 from koeninger/SPARK-13569.

(cherry picked from commit fd6e8f0)
Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
@asfgit asfgit closed this in fd6e8f0 Jul 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants