[SPARK-5074][Core][Tests] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite #5903

zsxwing · 2015-05-05T04:57:46Z

Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/

This is because many tests share the same JobListener. Because after each test, scheduler isn't stopped. So actually it's still running. When running the test run shuffle with map stage failure, some previous test may trigger ResubmitFailedStages logic, and report jobFailed and override the global failure variable.

This PR uses after to call scheduler.stop() for each test.

…lerSuite Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/ This is because all tests share the same `JobListener`. Because after each test, `scheduler` isn't stopped. So actually it's still running. When running the test `run shuffle with map stage failure`, some previous test may trigger `ResubmitFailedStages` logic and override the global `failure` variable.

zsxwing · 2015-05-05T04:59:11Z

core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala

Need to cancel it, or scheduler.stop() will trigger jobFailed and make this test fail.

SparkQA · 2015-05-05T05:05:37Z

Test build #31832 has finished for PR 5903 at commit 1e6f13e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-05-05T05:08:28Z

retest this please.

SparkQA · 2015-05-05T05:14:15Z

Test build #31833 has finished for PR 5903 at commit 1e6f13e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-05-05T06:02:40Z

retest this please.

SparkQA · 2015-05-05T06:09:15Z

Test build #31839 has finished for PR 5903 at commit 1e6f13e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-05-05T10:50:50Z

Jenkins retest this please.

SparkQA · 2015-05-05T12:34:42Z

Test build #31873 has finished for PR 5903 at commit 1e6f13e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-05-05T14:02:18Z

LGTM since the test starts a scheduler in before so should stop it in after. And yes cancelling the outstanding job seems OK. Tests pass.

… stage failure' in DAGSchedulerSuite Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/ This is because many tests share the same `JobListener`. Because after each test, `scheduler` isn't stopped. So actually it's still running. When running the test `run shuffle with map stage failure`, some previous test may trigger [ResubmitFailedStages](https://github.com/apache/spark/blob/ebc25a4ddfe07a67668217cec59893bc3b8cf730/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1120) logic, and report `jobFailed` and override the global `failure` variable. This PR uses `after` to call `scheduler.stop()` for each test. Author: zsxwing <zsxwing@gmail.com> Closes #5903 from zsxwing/SPARK-5074 and squashes the following commits: 1e6f13e [zsxwing] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite (cherry picked from commit 5ffc73e) Signed-off-by: Sean Owen <sowen@cloudera.com>

… stage failure' in DAGSchedulerSuite Test failure: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=centos/2240/testReport/junit/org.apache.spark.scheduler/DAGSchedulerSuite/run_shuffle_with_map_stage_failure/ This is because many tests share the same `JobListener`. Because after each test, `scheduler` isn't stopped. So actually it's still running. When running the test `run shuffle with map stage failure`, some previous test may trigger [ResubmitFailedStages](https://github.com/apache/spark/blob/ebc25a4ddfe07a67668217cec59893bc3b8cf730/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1120) logic, and report `jobFailed` and override the global `failure` variable. This PR uses `after` to call `scheduler.stop()` for each test. Author: zsxwing <zsxwing@gmail.com> Closes apache#5903 from zsxwing/SPARK-5074 and squashes the following commits: 1e6f13e [zsxwing] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite

zsxwing reviewed May 5, 2015
View reviewed changes

asfgit closed this in 5ffc73e May 5, 2015

zsxwing deleted the SPARK-5074 branch May 5, 2015 14:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-5074][Core][Tests] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite #5903

[SPARK-5074][Core][Tests] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite #5903

Uh oh!

zsxwing commented May 5, 2015

Uh oh!

zsxwing May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

zsxwing commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

zsxwing commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

srowen commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

srowen commented May 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-5074][Core][Tests] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite #5903

[SPARK-5074][Core][Tests] Fix the flakey test 'run shuffle with map stage failure' in DAGSchedulerSuite #5903

Uh oh!

Conversation

zsxwing commented May 5, 2015

Uh oh!

zsxwing May 5, 2015

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

zsxwing commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

zsxwing commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

srowen commented May 5, 2015

Uh oh!

SparkQA commented May 5, 2015

Uh oh!

srowen commented May 5, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants