use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler #1528

lianhuiwang · 2014-07-22T13:02:28Z

https://issues.apache.org/jira/browse/SPARK-2618

… on DAGScheduler

SparkQA · 2014-07-22T13:08:11Z

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16963/consoleFull

SparkQA · 2014-07-22T13:08:52Z

QA results for PR 1528:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16963/consoleFull

SparkQA · 2014-07-22T13:28:15Z

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16964/consoleFull

CodingCat · 2014-07-22T14:13:49Z

so it's actually another type of scheduling instead of FIFO/FAIR?

CodingCat · 2014-07-22T14:19:31Z

also, this is preemptive or non-preemptive?

according to my understanding on the code, it's non-preemptive, then a high priority TaskSet is easily to be delayed when there are a lot of last-long but low priority TaskSets

SparkQA · 2014-07-22T15:38:14Z

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16969/consoleFull

lianhuiwang · 2014-07-22T15:38:42Z

It add user defined priority to FIFO. If user do not configure priority, it work as before. It is non-preemptive.when there has free executors and pool is FIFO we can let high priority taskset's tasks firstly be submitted than lower priority taskset.

markhamstra · 2014-07-22T18:26:16Z

core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala

      taskScheduler.submitTasks(
-        new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))
+        new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, priority.toInt,
+          properties))


properties is already being passed to the TaskSet ctor, so I'd prefer that extraction of priority happen there or elsewhere instead of doing properties.getProperty here and adding another parameter to the TaskSet ctor.

agree, as DAGScheduler has known too much about task-level things......

SparkQA · 2014-07-23T03:48:27Z

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17013/consoleFull

lianhuiwang · 2014-07-23T03:49:28Z

@markhamstra @CodingCat thank you for comments, i updated patch, please review again.

markhamstra · 2014-07-23T03:54:56Z

core/src/main/scala/org/apache/spark/scheduler/SchedulingAlgorithm.scala

@@ -17,6 +17,8 @@

 package org.apache.spark.scheduler

+import scala.math.Ordering.Implicits._


Pulling in these implicits can have unintended consequences; that's why in my previous comment I kept the scope of the import as small as possible.

SparkQA · 2014-07-23T04:58:20Z

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17014/consoleFull

lianhuiwang · 2014-07-23T04:59:16Z

@markhamstra thank you. i update patch. have more comments?

SparkQA · 2014-07-23T05:28:06Z

QA results for PR 1528:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17013/consoleFull

markhamstra · 2014-07-23T05:57:21Z

core/src/main/scala/org/apache/spark/scheduler/TaskSet.scala

+    }else{
+      DEFAULT_PRIORITY
+    }
+  }


Is the style checker ok with val priority = if (...) {... instead of val priority = { if (...) {...? If it is, I'd rather do without the extra {}. You can also drop the : Int from val DEFAULT_PRIORITY and val priority -- the types are obvious without the annotations. Also, I'm not sure that DEFAULT_PRIORITY really gains you anything -- I'd be fine with just if (...) {...} else 0. And make sure you follow the style guide for spacing with parens and braces.

SparkQA · 2014-07-23T06:37:34Z

QA results for PR 1528:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17014/consoleFull

lianhuiwang · 2014-07-23T08:34:57Z

@markhamstra thank you.how about latest code?

SparkQA · 2014-07-23T08:38:24Z

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17026/consoleFull

markhamstra · 2014-07-23T08:49:38Z

This looks like a clean implementation, but you still need to open a JIRA issue to explain why you want this; then edit the description of this PR to reference that JIRA. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode

markhamstra · 2014-07-23T08:51:52Z

Sorry, looks like you already have SPARK-2618, so change change the title of this PR to include that.

lianhuiwang · 2014-07-23T09:06:51Z

@markhamstra @pwendell i have updated SPARK-2618, please take a look. thanks

SparkQA · 2014-07-23T10:14:07Z

QA results for PR 1528:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17026/consoleFull

CodingCat · 2014-07-23T11:31:50Z

en....I'm thinking that if we can achieve the same goal with FAIR scheduler.....my own answer is yes......@markhamstra your thoughts?

lianhuiwang · 2014-07-23T11:38:12Z

i donot think priority is useful for FAIR scheduler. on YARN scheduler priority is work with FIFO and not with FAIR. so i think spark application's scheduler mode is same with YARN.we can see YARN's FAIR:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java

CodingCat · 2014-07-23T11:50:38Z

I mean if we just want to prioritize some jobs, why not assigning them to a pool with higher weight?

lianhuiwang · 2014-07-23T12:02:56Z

@CodingCat maybe i misunderstand you. with FAIR Schedulable.weight can replace priority. you mean with FAIR we can provide weight config to user? example:spark.scheduler.weight. if it is right , i think we can achieve it.

markhamstra · 2014-07-23T18:40:59Z

Yeah, I'm wondering whether the actual problem is that creation and use of scheduler pools with different weights is unclear or too difficult; and that if we could resolve those issues, then the need for this PR would disappear.

pwendell · 2014-07-25T04:10:25Z

We shouldn't should expose these types of hooks into the scheduler internals. The TaskSet, for instance, is an implementation detail we don't want to be part of a public API and the priority is an internal concept.

The public API of Spark for scheduling policies is the Fair Scheduler. Many different types of policies can be achieved within Fair Scheduling, including having a high priority pool to which tasks are submitted.

lianhuiwang · 2014-07-25T09:04:38Z

the current implementation of scheduling is very ugly. so i cannot find space to add this config to complete job's priority.anyone can help me?

pwendell · 2014-09-02T01:37:40Z

Hey @lianhuiwang I'd prefer to close this issue and take the discussion about scheduling to the user list if you are not sure how to configure the scheduler to do what you want. Exposing internals like this to the user is not a great idea since these API's will likely change in the future.

lianhuiwang and others added 8 commits May 23, 2014 22:02

bugfix worker DriverStateChanged state should match DriverState.FAILED

f2b5970

address aarondav comments

480ce94

Merge remote-tracking branch 'upstream/master'

8bbfe76

Merge remote-tracking branch 'upstream/master'

eacf933

Merge remote-tracking branch 'upstream/master'

44a3f50

Merge remote-tracking branch 'upstream/master'

20f81fa

Merge remote-tracking branch 'spark/master'

66371a1

use config spark.scheduler.priority for specifying TaskSet's priority…

1e1e30e

… on DAGScheduler

fix file line length exceeds 100

69da641

Fix bug

21a9bcd

markhamstra reviewed Jul 22, 2014
View reviewed changes

lianhuiwang added 3 commits July 23, 2014 11:23

address markhamstra comments

79b30ee

merge from origin

371825f

DAGScheduler donot pass priority to TaskSet

94bc6e9

markhamstra reviewed Jul 23, 2014
View reviewed changes

address markhamstra comments

d2b0878

markhamstra reviewed Jul 23, 2014
View reviewed changes

lianhuiwang added 3 commits July 23, 2014 16:28

address markhamstra comments with droping extra code

c44df00

add space for code style

535f3ea

add space for code style

d1eae88

asfgit closed this in 1f98add Sep 2, 2014

		@@ -17,6 +17,8 @@

		package org.apache.spark.scheduler

		import scala.math.Ordering.Implicits._

use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler #1528

use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler #1528

Uh oh!

Conversation

lianhuiwang commented Jul 22, 2014

Uh oh!

SparkQA commented Jul 22, 2014

Uh oh!

SparkQA commented Jul 22, 2014

Uh oh!

SparkQA commented Jul 22, 2014

Uh oh!

CodingCat commented Jul 22, 2014

Uh oh!

CodingCat commented Jul 22, 2014

Uh oh!

SparkQA commented Jul 22, 2014

Uh oh!

lianhuiwang commented Jul 22, 2014

Uh oh!

markhamstra Jul 22, 2014

Choose a reason for hiding this comment

Uh oh!

CodingCat Jul 22, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 23, 2014

Uh oh!

lianhuiwang commented Jul 23, 2014

Uh oh!

markhamstra Jul 23, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 23, 2014

Uh oh!

lianhuiwang commented Jul 23, 2014

Uh oh!

SparkQA commented Jul 23, 2014

Uh oh!

markhamstra Jul 23, 2014

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Jul 23, 2014

Uh oh!

lianhuiwang commented Jul 23, 2014

Uh oh!

SparkQA commented Jul 23, 2014

Uh oh!

markhamstra commented Jul 23, 2014

Uh oh!

markhamstra commented Jul 23, 2014

Uh oh!

lianhuiwang commented Jul 23, 2014

Uh oh!

SparkQA commented Jul 23, 2014

Uh oh!

CodingCat commented Jul 23, 2014

Uh oh!

lianhuiwang commented Jul 23, 2014

Uh oh!

CodingCat commented Jul 23, 2014

Uh oh!

lianhuiwang commented Jul 23, 2014

Uh oh!

markhamstra commented Jul 23, 2014

Uh oh!

pwendell commented Jul 25, 2014

Uh oh!

lianhuiwang commented Jul 25, 2014

Uh oh!

pwendell commented Sep 2, 2014

Uh oh!

Uh oh!