Skip to content

[SPARK-1022] Kafka unit test that actually sends and receives data #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Apr 25, 2014

This was originally done by @jerryshao in the incubator repository. I cherry picked him commits and made a new PR here. However, the tests are flaky so they are ignored for now.

Conflicts:
	external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java
	external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala
	project/SparkBuild.scala
@tdas tdas changed the title [SPARK-1022 [SPARK-1022] Kafka unit test that actually sends and receives data Apr 25, 2014
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@tdas
Copy link
Contributor Author

tdas commented Apr 25, 2014

@jerryshao It would be great if you can take a look at this sometime. The tests seems to fail 1 in 3 times as the receiver does not receive any data. Also there are a number of warning and errors in the log when the test are run, some of which are particularly disturbing (method not found in ZkClient!)

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14491/

@jerryshao
Copy link
Contributor

Hi TD, thanks for bringing it out, I will checkout your branch to see the details.

@jerryshao
Copy link
Contributor

Hi TD, the exception (method not found in ZkClient) is introduced by zkclient-0.1.jar. Actually Kafka 0.8 relies on zkclient-0.3.jar which will be automatically imported by Kafka. So by removing the dependency line in build profile: "com.github.sgroschupf" % "zkclient" % "0.1" excludeAll(excludeNetty), can remove this exception.

As tests seems to fail unpredictably, it also confuses me a lot, I'm not sure it's the problem of producer do not produce any data, or the problem of receiver do not receive any data. I will trace it to see the details if I can.

pwendell added a commit to pwendell/spark that referenced this pull request May 12, 2014
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build.

Author: Patrick Wendell <pwendell@gmail.com>
Author: Prashant Sharma <scrapcodes@gmail.com>

== Merge branch commits ==

commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4
Author: Prashant Sharma <scrapcodes@gmail.com>
Date:   Sun Feb 9 17:39:07 2014 +0530

    scala style fixes

commit f91709887a8e0b608c5c2b282db19b8a44d53a43
Author: Patrick Wendell <pwendell@gmail.com>
Date:   Fri Jan 24 11:22:53 2014 -0800

    Adding scalastyle snapshot
pwendell pushed a commit to pwendell/spark that referenced this pull request May 12, 2014
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2

Continuation of PR apache#557

With this all scala style errors are fixed across the code base !!

The reason for creating a separate PR was to not interrupt an already reviewed and ready to merge PR. Hope this gets reviewed soon and merged too.

Author: Prashant Sharma <prashant.s@imaginea.com>

Closes apache#567 and squashes the following commits:

3b1ec30 [Prashant Sharma] scala style fixes
@tdas
Copy link
Contributor Author

tdas commented Jul 10, 2014

@jerryshao I know this is old stuff but did you get a chance to take a look further into this? If not, it would be great if you can while working on other kafka + streaming stuff. :)

@jerryshao
Copy link
Contributor

Yes, I did revisit this patch again long ago, but as you mentioned it is so flaky. I guess the main reason is that our Kafka API call is in high level which involves Zookeeper's behavior (unknown latency or others), so our unit test cannot precisely reproduce the result each time. I think if we shift to some low level API, we can have more power to control the behavior, this problem can be easily fixed.

@tdas
Copy link
Contributor Author

tdas commented Jul 11, 2014

Its weird that zookeeper behavior is so unpredictable that this test would
be so flaky. If we make the low-level API we should still keep the
high-level API for backward compatibility for a while, so it would be good
if a fix for this can be figured out.

TD

On Thu, Jul 10, 2014 at 6:07 PM, Saisai Shao notifications@github.com
wrote:

Yes, I did revisit this patch again long ago, but as you mentioned it is
so flaky. I guess the main reason is that our Kafka API call is in high
level which involves Zookeeper's behavior (unknown latency or others), so
our unit test cannot precisely reproduce the result each time. I think if
we shift to some low level API, we can have more power to control the
behavior, this problem can be easily fixed.


Reply to this email directly or view it on GitHub
#557 (comment).

@jerryshao
Copy link
Contributor

I'm not super familiar with Kafka's internal, so this unit test is mostly taken Kafka's unit test as references, there might be some miss use without diving into the details deeply. I'll try to fix this, really depends on my knowledge of Kafka :)

@jerryshao
Copy link
Contributor

Hi TD, I've fixed this flaky issue in my local test, how should I submit my patch, should I push to your branch or create a new PR?

@tdas
Copy link
Contributor Author

tdas commented Jul 30, 2014

Dang, I missed this update. Go ahead and make a new PR. I can close this PR.
Incidentally, what was the issue?

@jerryshao
Copy link
Contributor

I think the main reason is that some times when we feed our test data to Kafka, the KafkaReceiver is still starting, after the receiver started, it will ignore the previous sent data and read from the end of the queue (because of the behavior Kafka 0.8's auto.offset.reset=largest), so previous sent test data will never be consumed, and the test will be failed. So the simple way is to sleep for several seconds after ssc.start() and then feed the data.

@jerryshao
Copy link
Contributor

I will create another PR, thanks a lot. I think it would be better to merge (#1508) before, so we can also test the Java API correctly. Thanks a lot.

@tdas
Copy link
Contributor Author

tdas commented Jul 31, 2014

Yeah, will merge #1508 as soon as the tests pass.

@tdas
Copy link
Contributor Author

tdas commented Aug 1, 2014

@jerryshao Since #1508 has been merged, can you update this and test it to make sure this works?

@asfgit asfgit closed this in 87738bf Aug 2, 2014
@jerryshao
Copy link
Contributor

Yeah, I will submit to new PR about this ASAP :).

asfgit pushed a commit that referenced this pull request Aug 5, 2014
This PR is a updated version of (#557) to actually test sending and receiving data through Kafka, and fix previous flaky issues.

@tdas, would you mind reviewing this PR? Thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits:

b6a505f [jerryshao] code refactor according to comments
5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it
5525f10 [jerryshao] Fix flaky issue of Kafka real unit test
4559310 [jerryshao] Minor changes for Kafka unit test
860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness
796d4ca [jerryshao] Add real Kafka streaming test
asfgit pushed a commit that referenced this pull request Aug 5, 2014
This PR is a updated version of (#557) to actually test sending and receiving data through Kafka, and fix previous flaky issues.

@tdas, would you mind reviewing this PR? Thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits:

b6a505f [jerryshao] code refactor according to comments
5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it
5525f10 [jerryshao] Fix flaky issue of Kafka real unit test
4559310 [jerryshao] Minor changes for Kafka unit test
860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness
796d4ca [jerryshao] Add real Kafka streaming test
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
This PR is a updated version of (apache#557) to actually test sending and receiving data through Kafka, and fix previous flaky issues.

@tdas, would you mind reviewing this PR? Thanks a lot.

Author: jerryshao <saisai.shao@intel.com>

Closes apache#1751 from jerryshao/kafka-unit-test and squashes the following commits:

b6a505f [jerryshao] code refactor according to comments
5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it
5525f10 [jerryshao] Fix flaky issue of Kafka real unit test
4559310 [jerryshao] Minor changes for Kafka unit test
860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness
796d4ca [jerryshao] Add real Kafka streaming test
erikerlandson pushed a commit to erikerlandson/spark that referenced this pull request Nov 29, 2017
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Add no_log when setting gcp account since the prblem in
post job has been solved.
arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020
…json table (apache#557)

* MapR [SPARK-588] Spark thriftserver fails when work with hive-maprdb json table
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants