-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-23408][SS][BRANCH-2.3] Synchronize successive AddData actions in Streaming*JoinSuite #23757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23408][SS][BRANCH-2.3] Synchronize successive AddData actions in Streaming*JoinSuite #23757
Conversation
|
retest this, please |
1 similar comment
|
retest this, please |
|
Test build #102133 has finished for PR 23757 at commit
|
|
Test build #102131 has finished for PR 23757 at commit
|
|
Test build #102132 has finished for PR 23757 at commit
|
|
NOTE: e08bf2e (my own commit) should be squashed into be41fa7 - I'll sort out later. I would like to confirm flakiness is resolved first. |
|
Retest this, please |
2 similar comments
|
Retest this, please |
|
Retest this, please |
|
Test build #102144 has finished for PR 23757 at commit
|
|
Test build #102143 has finished for PR 23757 at commit
|
|
Test build #102145 has finished for PR 23757 at commit
|
|
I wish d5afd92 would be the last one needed for resolving flakiness tests. Only one failed for e08bf2e and the test failure is related to d5afd92. |
|
Retest this, please |
1 similar comment
|
Retest this, please |
|
Test build #102151 has finished for PR 23757 at commit
|
|
Test build #102152 has finished for PR 23757 at commit
|
|
Test build #102153 has finished for PR 23757 at commit
|
d5afd92 to
d9fcb66
Compare
|
retest this, please |
2 similar comments
|
retest this, please |
|
retest this, please |
|
cc. @maropu @srowen |
|
More clearly, this PR intends to address https://issues.apache.org/jira/browse/SPARK-24211 but while addressing it triggers https://issues.apache.org/jira/browse/SPARK-24239 (may want to change affect version?) so also the fix of SPARK-24239 is included to the PR. My understanding is we could resolve them with setting version to 2.3.4. |
|
Test build #102161 has finished for PR 23757 at commit
|
|
Test build #102159 has finished for PR 23757 at commit
|
|
BTW is this a 'port' of a similar fix from later branches? it would be good to cross-reference those. |
|
retest this, please |
1 similar comment
|
retest this, please |
|
Test build #102186 has finished for PR 23757 at commit
|
|
Test failed from Retest this, please |
|
Retest this, please |
|
Test build #102199 has finished for PR 23757 at commit
|
|
Just to make sure I understand the status: This is a straight back-port of https://github.com/apache/spark/pull/20650/files?w=1 , and indeed it looks like the same change we see here with https://github.com/apache/spark/pull/23757/files?w=1 , OK. It can be merged first without other changes, pending tests? Maybe someone can help me understand this. The diff at I know the difference is supposed to be just whitespace in the diff, but the latter seems to show a much more significant reordering of code. I can't figure out what to make of it. Am I missing something obvious? |
Yes (and you can also resolve SPARK-24211). And then SPARK-23491 and SPARK-23416 can be cherry-picked from origin PRs since they're clean cherry-pick and we already ran tests in this PR before revising.(and you can also resolve SPARK-24239)
It is just same as what #20650 was. You could check the difference between https://github.com/apache/spark/pull/20650/files?w=1 vs https://github.com/apache/spark/pull/20650/files The code change add indentation(s) to huge code block: BTW, after fixing this I'm seeing very high change of failures from FileBasedDataSourceSuite - as I commented earlier it might need ORC version upgrade to fix some of related issues, so while we seemed to upgrade the ORC dependency from bugfix version (2.4.0 -> 2.4.1) I'm not sure we could do in 2.3 version line. |
|
IMHO once we know we're not adding code to branch-2.3, passed builds could be a complement of pending test. If we need to run test just before merging, I'd propose triggering 5 builds concurrently due to high chance of failure of |
|
OK after staring at it longer, I see that the difference between the diffs is actually whitespace. It looks much bigger because it's mismatching lines that occur several time identically in different parts of the code, so it renders here as a big delete and add. OK disregard that, just wanted to make sure I'm not crazy. This is looking good pending tests. Once this passes and is merged, go ahead with any other changes as you see fit. |
|
I'm not a committer so I need committer's help on cherry-picking remaining two commits (SPARK-23491 and SPARK-23416). Not sure we would want to have PRs for clean cherry-pick: if we would like to have them explicitly please let me know - I'll submit PRs. |
|
retest this, please |
1 similar comment
|
retest this, please |
|
Let me run three builds concurrently: as build history in this PR, there's other flaky test which have high chance of failure as well. |
|
Test build #4551 has finished for PR 23757 at commit
|
|
Test build #102211 has finished for PR 23757 at commit
|
|
Test build #102212 has finished for PR 23757 at commit
|
|
Is the single line below different from the original one? cc: @dongjoon-hyun could you help? |
|
Yes right. Maybe possibly other line but it will be just an import. |
|
ok, thanks! Looks ok to me. |
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Just so I'm 100% clear, this is intended to be merged first right?
|
Yes right. |
…in Streaming*JoinSuite ## What changes were proposed in this pull request? **The best way to review this PR is to ignore whitespace/indent changes. Use this link - https://github.com/apache/spark/pull/20650/files?w=1** The stream-stream join tests add data to multiple sources and expect it all to show up in the next batch. But there's a race condition; the new batch might trigger when only one of the AddData actions has been reached. Prior attempt to solve this issue by jose-torres in #20646 attempted to simultaneously synchronize on all memory sources together when consecutive AddData was found in the actions. However, this carries the risk of deadlock as well as unintended modification of stress tests (see the above PR for a detailed explanation). Instead, this PR attempts the following. - A new action called `StreamProgressBlockedActions` that allows multiple actions to be executed while the streaming query is blocked from making progress. This allows data to be added to multiple sources that are made visible simultaneously in the next batch. - An alias of `StreamProgressBlockedActions` called `MultiAddData` is explicitly used in the `Streaming*JoinSuites` to add data to two memory sources simultaneously. This should avoid unintentional modification of the stress tests (or any other test for that matter) while making sure that the flaky tests are deterministic. NOTE: This patch is modified a bit from origin PR (#20650) to cover DSv2 incompatibility between Spark 2.3 and 2.4: StreamingDataSourceV2Relation is a class for 2.3, whereas it is a case class for 2.4 ## How was this patch tested? Modified test cases in `Streaming*JoinSuites` where there are consecutive `AddData` actions. Closes #23757 from HeartSaVioR/fix-streaming-join-test-flakiness-branch-2.3. Lead-authored-by: Jungtaek Lim (HeartSaVioR) <kabhwan@gmail.com> Co-authored-by: Tathagata Das <tathagata.das1565@gmail.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>
|
Merged to 2.3. |
|
Thanks for merging! As a follow-up, we may want to consider doing these things to resolve flaky tests specifically in branch-2.3:
|
|
I'll cherry-pick SPARK-23491/23416 in brach-2.3. |
|
@srowen @HeartSaVioR ok, done. Let's keep watching jenkins tests in branch-2.3. |
|
Thanks @maropu! |
What changes were proposed in this pull request?
The best way to review this PR is to ignore whitespace/indent changes. Use this link - https://github.com/apache/spark/pull/20650/files?w=1
The stream-stream join tests add data to multiple sources and expect it all to show up in the next batch. But there's a race condition; the new batch might trigger when only one of the AddData actions has been reached.
Prior attempt to solve this issue by jose-torres in #20646 attempted to simultaneously synchronize on all memory sources together when consecutive AddData was found in the actions. However, this carries the risk of deadlock as well as unintended modification of stress tests (see the above PR for a detailed explanation). Instead, this PR attempts the following.
StreamProgressBlockedActionsthat allows multiple actions to be executed while the streaming query is blocked from making progress. This allows data to be added to multiple sources that are made visible simultaneously in the next batch.StreamProgressBlockedActionscalledMultiAddDatais explicitly used in theStreaming*JoinSuitesto add data to two memory sources simultaneously.This should avoid unintentional modification of the stress tests (or any other test for that matter) while making sure that the flaky tests are deterministic.
NOTE: This patch is modified a bit from origin PR (#20650) to cover DSv2 incompatibility between Spark 2.3 and 2.4: StreamingDataSourceV2Relation is a class for 2.3, whereas it is a case class for 2.4
How was this patch tested?
Modified test cases in
Streaming*JoinSuiteswhere there are consecutiveAddDataactions.