Skip to content

use config spark.scheduler.priority for specifying TaskSet's priority on DAGScheduler #1528

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

lianhuiwang
Copy link
Contributor

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16963/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA results for PR 1528:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16963/consoleFull

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16964/consoleFull

@CodingCat
Copy link
Contributor

so it's actually another type of scheduling instead of FIFO/FAIR?

@CodingCat
Copy link
Contributor

also, this is preemptive or non-preemptive?

according to my understanding on the code, it's non-preemptive, then a high priority TaskSet is easily to be delayed when there are a lot of last-long but low priority TaskSets

@SparkQA
Copy link

SparkQA commented Jul 22, 2014

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16969/consoleFull

@lianhuiwang
Copy link
Contributor Author

It add user defined priority to FIFO. If user do not configure priority, it work as before. It is non-preemptive.when there has free executors and pool is FIFO we can let high priority taskset's tasks firstly be submitted than lower priority taskset.

taskScheduler.submitTasks(
new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, properties))
new TaskSet(tasks.toArray, stage.id, stage.newAttemptId(), stage.jobId, priority.toInt,
properties))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

properties is already being passed to the TaskSet ctor, so I'd prefer that extraction of priority happen there or elsewhere instead of doing properties.getProperty here and adding another parameter to the TaskSet ctor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, as DAGScheduler has known too much about task-level things......

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17013/consoleFull

@lianhuiwang
Copy link
Contributor Author

@markhamstra @CodingCat thank you for comments, i updated patch, please review again.

@@ -17,6 +17,8 @@

package org.apache.spark.scheduler

import scala.math.Ordering.Implicits._
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling in these implicits can have unintended consequences; that's why in my previous comment I kept the scope of the import as small as possible.

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17014/consoleFull

@lianhuiwang
Copy link
Contributor Author

@markhamstra thank you. i update patch. have more comments?

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1528:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17013/consoleFull

}else{
DEFAULT_PRIORITY
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the style checker ok with val priority = if (...) {... instead of val priority = { if (...) {...? If it is, I'd rather do without the extra {}. You can also drop the : Int from val DEFAULT_PRIORITY and val priority -- the types are obvious without the annotations. Also, I'm not sure that DEFAULT_PRIORITY really gains you anything -- I'd be fine with just if (...) {...} else 0. And make sure you follow the style guide for spacing with parens and braces.

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1528:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17014/consoleFull

@lianhuiwang
Copy link
Contributor Author

@markhamstra thank you.how about latest code?

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA tests have started for PR 1528. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17026/consoleFull

@markhamstra
Copy link
Contributor

This looks like a clean implementation, but you still need to open a JIRA issue to explain why you want this; then edit the description of this PR to reference that JIRA. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCode

@markhamstra
Copy link
Contributor

Sorry, looks like you already have SPARK-2618, so change change the title of this PR to include that.

@lianhuiwang
Copy link
Contributor Author

@markhamstra @pwendell i have updated SPARK-2618, please take a look. thanks

@SparkQA
Copy link

SparkQA commented Jul 23, 2014

QA results for PR 1528:
- This patch FAILED unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17026/consoleFull

@CodingCat
Copy link
Contributor

en....I'm thinking that if we can achieve the same goal with FAIR scheduler.....my own answer is yes......@markhamstra your thoughts?

@lianhuiwang
Copy link
Contributor Author

i donot think priority is useful for FAIR scheduler. on YARN scheduler priority is work with FIFO and not with FAIR. so i think spark application's scheduler mode is same with YARN.we can see YARN's FAIR:
https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/policies/FairSharePolicy.java

@CodingCat
Copy link
Contributor

I mean if we just want to prioritize some jobs, why not assigning them to a pool with higher weight?

@lianhuiwang
Copy link
Contributor Author

@CodingCat maybe i misunderstand you. with FAIR Schedulable.weight can replace priority. you mean with FAIR we can provide weight config to user? example:spark.scheduler.weight. if it is right , i think we can achieve it.

@markhamstra
Copy link
Contributor

Yeah, I'm wondering whether the actual problem is that creation and use of scheduler pools with different weights is unclear or too difficult; and that if we could resolve those issues, then the need for this PR would disappear.

@pwendell
Copy link
Contributor

We shouldn't should expose these types of hooks into the scheduler internals. The TaskSet, for instance, is an implementation detail we don't want to be part of a public API and the priority is an internal concept.

The public API of Spark for scheduling policies is the Fair Scheduler. Many different types of policies can be achieved within Fair Scheduling, including having a high priority pool to which tasks are submitted.

@lianhuiwang
Copy link
Contributor Author

the current implementation of scheduling is very ugly. so i cannot find space to add this config to complete job's priority.anyone can help me?

@pwendell
Copy link
Contributor

pwendell commented Sep 2, 2014

Hey @lianhuiwang I'd prefer to close this issue and take the discussion about scheduling to the user list if you are not sure how to configure the scheduler to do what you want. Exposing internals like this to the user is not a great idea since these API's will likely change in the future.

@asfgit asfgit closed this in 1f98add Sep 2, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants