-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-1022] Kafka unit test that actually sends and receives data #557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Conflicts: external/kafka/src/test/java/org/apache/spark/streaming/kafka/JavaKafkaStreamSuite.java external/kafka/src/test/scala/org/apache/spark/streaming/kafka/KafkaStreamSuite.scala project/SparkBuild.scala
Conflicts: project/SparkBuild.scala
Merged build triggered. |
Merged build started. |
@jerryshao It would be great if you can take a look at this sometime. The tests seems to fail 1 in 3 times as the receiver does not receive any data. Also there are a number of warning and errors in the log when the test are run, some of which are particularly disturbing (method not found in ZkClient!) |
Merged build finished. All automated tests passed. |
All automated tests passed. |
Hi TD, thanks for bringing it out, I will checkout your branch to see the details. |
Hi TD, the exception (method not found in ZkClient) is introduced by zkclient-0.1.jar. Actually Kafka 0.8 relies on zkclient-0.3.jar which will be automatically imported by Kafka. So by removing the dependency line in build profile: As tests seems to fail unpredictably, it also confuses me a lot, I'm not sure it's the problem of producer do not produce any data, or the problem of receiver do not receive any data. I will trace it to see the details if I can. |
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Author: Patrick Wendell <pwendell@gmail.com> Author: Prashant Sharma <scrapcodes@gmail.com> == Merge branch commits == commit 1a8bd1c059b842cb95cc246aaea74a79fec684f4 Author: Prashant Sharma <scrapcodes@gmail.com> Date: Sun Feb 9 17:39:07 2014 +0530 scala style fixes commit f91709887a8e0b608c5c2b282db19b8a44d53a43 Author: Patrick Wendell <pwendell@gmail.com> Date: Fri Jan 24 11:22:53 2014 -0800 Adding scalastyle snapshot
SPARK-1058, Fix Style Errors and Add Scala Style to Spark Build. Pt 2 Continuation of PR apache#557 With this all scala style errors are fixed across the code base !! The reason for creating a separate PR was to not interrupt an already reviewed and ready to merge PR. Hope this gets reviewed soon and merged too. Author: Prashant Sharma <prashant.s@imaginea.com> Closes apache#567 and squashes the following commits: 3b1ec30 [Prashant Sharma] scala style fixes
@jerryshao I know this is old stuff but did you get a chance to take a look further into this? If not, it would be great if you can while working on other kafka + streaming stuff. :) |
Yes, I did revisit this patch again long ago, but as you mentioned it is so flaky. I guess the main reason is that our Kafka API call is in high level which involves Zookeeper's behavior (unknown latency or others), so our unit test cannot precisely reproduce the result each time. I think if we shift to some low level API, we can have more power to control the behavior, this problem can be easily fixed. |
Its weird that zookeeper behavior is so unpredictable that this test would TD On Thu, Jul 10, 2014 at 6:07 PM, Saisai Shao notifications@github.com
|
I'm not super familiar with Kafka's internal, so this unit test is mostly taken Kafka's unit test as references, there might be some miss use without diving into the details deeply. I'll try to fix this, really depends on my knowledge of Kafka :) |
Hi TD, I've fixed this flaky issue in my local test, how should I submit my patch, should I push to your branch or create a new PR? |
Dang, I missed this update. Go ahead and make a new PR. I can close this PR. |
I think the main reason is that some times when we feed our test data to Kafka, the KafkaReceiver is still starting, after the receiver started, it will ignore the previous sent data and read from the end of the queue (because of the behavior Kafka 0.8's auto.offset.reset=largest), so previous sent test data will never be consumed, and the test will be failed. So the simple way is to sleep for several seconds after |
I will create another PR, thanks a lot. I think it would be better to merge (#1508) before, so we can also test the Java API correctly. Thanks a lot. |
Yeah, will merge #1508 as soon as the tests pass. |
@jerryshao Since #1508 has been merged, can you update this and test it to make sure this works? |
Yeah, I will submit to new PR about this ASAP :). |
This PR is a updated version of (#557) to actually test sending and receiving data through Kafka, and fix previous flaky issues. @tdas, would you mind reviewing this PR? Thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits: b6a505f [jerryshao] code refactor according to comments 5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it 5525f10 [jerryshao] Fix flaky issue of Kafka real unit test 4559310 [jerryshao] Minor changes for Kafka unit test 860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness 796d4ca [jerryshao] Add real Kafka streaming test
This PR is a updated version of (#557) to actually test sending and receiving data through Kafka, and fix previous flaky issues. @tdas, would you mind reviewing this PR? Thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes #1751 from jerryshao/kafka-unit-test and squashes the following commits: b6a505f [jerryshao] code refactor according to comments 5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it 5525f10 [jerryshao] Fix flaky issue of Kafka real unit test 4559310 [jerryshao] Minor changes for Kafka unit test 860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness 796d4ca [jerryshao] Add real Kafka streaming test
This PR is a updated version of (apache#557) to actually test sending and receiving data through Kafka, and fix previous flaky issues. @tdas, would you mind reviewing this PR? Thanks a lot. Author: jerryshao <saisai.shao@intel.com> Closes apache#1751 from jerryshao/kafka-unit-test and squashes the following commits: b6a505f [jerryshao] code refactor according to comments 5222330 [jerryshao] Change JavaKafkaStreamSuite to better test it 5525f10 [jerryshao] Fix flaky issue of Kafka real unit test 4559310 [jerryshao] Minor changes for Kafka unit test 860f649 [jerryshao] Minor style changes, and tests ignored due to flakiness 796d4ca [jerryshao] Add real Kafka streaming test
Add no_log when setting gcp account since the prblem in post job has been solved.
…json table (apache#557) * MapR [SPARK-588] Spark thriftserver fails when work with hive-maprdb json table
This was originally done by @jerryshao in the incubator repository. I cherry picked him commits and made a new PR here. However, the tests are flaky so they are ignored for now.