[SPARK-10618] [Mesos] Refactoring scheduling condition and adding test for same #8771

SleepyThread · 2015-09-15T18:06:04Z

This commit just refactored a scheduling condition into the method and added test for the same.

SleepyThread · 2015-09-15T21:20:17Z

@andrewor14 @dragos Can you please take a look at it?

tnachen · 2015-09-16T00:40:11Z

...c/test/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackendSuite.scala

+    val taskScheduler = mock[TaskSchedulerImpl]
+    when(taskScheduler.sc).thenReturn(sc)
+
+


tnachen · 2015-09-16T00:42:50Z

I think this is a good idea, can you do this for fine grain mode too?

dragos · 2015-09-16T08:29:34Z

Not a bad idea, though I don't think this should go in before #8639, which it conflicts with (and which adds real functionality, and is reviewed).

SleepyThread · 2015-09-16T09:56:58Z

I agree with @dragos. After #8639 is merged in master. I will update this pull request.

@tnachen I am a fan of small changes and hence will it be ok if i open another pull request for fine-grained mode ?

SleepyThread · 2015-09-17T18:55:38Z

Refactored and added test for fine grained mode. Waiting for #8639 to be merged in master.

tnachen · 2015-09-23T16:33:01Z

Actually I'm thinking what could be better is that if we can log what's the exact condition that didn't pass which made us skip the offer. It becomes a bit hard for users to guess why no jobs are running.
How about making each testing condition a rule, and whenever any of them wasn't matched we also have a corresponding message saying why the offer was rejected with the exact reason?

SleepyThread · 2015-09-24T14:42:00Z

@tnachen Its really a great idea. But we should log these things in debug mode, rather than info log otherwise it will screw up the logs.

I will add a new patch with a rule based validation and logging once #86393 is merged.

tnachen · 2015-09-24T17:22:45Z

Yes please log debug :) But at least the information is available.

AmplabJenkins · 2015-10-08T16:27:12Z

Can one of the admins verify this patch?

andrewor14 · 2015-11-10T20:57:25Z

@SleepyThread #8639 is merged into master now. Can you update this?

tnachen · 2015-11-12T18:02:08Z

core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/CoarseMesosSchedulerBackend.scala

-            cpus >= 1 &&
-            failuresBySlaveId.getOrElse(slaveId, 0) < MAX_SLAVE_FAILURES &&
-            !slaveIdsWithExecutors.contains(slaveId)) {
+        if (isOfferValidForScheduling(meetsConstraints, slaveId, mem, cpus, sc)) {


I think the wording is a bit odd, it's not that the offer is not valid, it usually doesn't meet the job requirements.
How about naming this isOfferSatisfiesRequirements?

tnachen · 2015-11-12T18:06:03Z

Just took another pass of this review, besides the naming suggestions and style fixes overall it LGTM.
Also please update the title of this PR since it's updating both coarse grain and fine grain mode.

…parent class loader Without patch, two additional tests of ExecutorClassLoaderSuite fails. - "resource from parent" - "resources from parent" Detailed explanation is here, https://issues.apache.org/jira/browse/SPARK-11818?focusedCommentId=15011202&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15011202 Author: Jungtaek Lim <kabhwan@gmail.com> Closes #9812 from HeartSaVioR/SPARK-11818.

we should pass in resolved encodera to logical `CoGroup` and bind them in physical `CoGroup` Author: Wenchen Fan <wenchen@databricks.com> Closes #9928 from cloud-fan/cogroup.

Remove duplicate ml examples (only for ml). mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9933 from yanboliang/SPARK-11685.

…aries ignore weight col Doc for 1.6 that the summaries mostly ignore the weight column. To be corrected for 1.7 CC: mengxr thunterdb Author: Joseph K. Bradley <joseph@databricks.com> Closes #9927 from jkbradley/linregsummary-doc.

Add read/write support to LDA, similar to ALS. save/load for ml.LocalLDAModel is done. For DistributedLDAModel, I'm not sure if we can invoke save on the mllib.DistributedLDAModel directly. I'll send update after some test. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #9894 from hhbyyh/ldaMLsave.

Author: Wenchen Fan <wenchen@databricks.com> Closes #9909 from cloud-fan/get-struct.

… bus's thread This is continuation of SPARK-11761 Andrew suggested adding this protection. See tail of #9741 Author: tedyu <yuzhihong@gmail.com> Closes #9852 from tedyu/master.

Currently pivot's signature looks like ```scala scala.annotation.varargs def pivot(pivotColumn: Column, values: Column*): GroupedData scala.annotation.varargs def pivot(pivotColumn: String, values: Any*): GroupedData ``` I think we can remove the one that takes "Column" types, since callers should always be passing in literals. It'd also be more clear if the values are not varargs, but rather Seq or java.util.List. I also made similar changes for Python. Author: Reynold Xin <rxin@databricks.com> Closes #9929 from rxin/SPARK-11946.

…rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10203 from jbonofre/SPARK-11193.

https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.

…ct disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes #10261 from zsxwing/SPARK-12267.

… in the shutdown hook 1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook. 2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook. This should fix the potential exceptions when exiting a local cluster ``` java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown. at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Author: Shixiong Zhu <shixiong@databricks.com> Closes #10269 from zsxwing/executor-state.

Currently, we could generate different plans for query with single distinct (depends on spark.sql.specializeSingleDistinctAggPlanning), one works better on low cardinality columns, the other works better for high cardinality column (default one). This PR change to generate a single plan (three aggregations and two exchanges), which work better in both cases, then we could safely remove the flag `spark.sql.specializeSingleDistinctAggPlanning` (introduced in 1.6). For a query like `SELECT COUNT(DISTINCT a) FROM table` will be ``` AGG-4 (count distinct) Shuffle to a single reducer Partial-AGG-3 (count distinct, no grouping) Partial-AGG-2 (grouping on a) Shuffle by a Partial-AGG-1 (grouping on a) ``` This PR also includes large refactor for aggregation (reduce 500+ lines of code) cc yhuai nongli marmbrus Author: Davies Liu <davies@databricks.com> Closes #10228 from davies/single_distinct.

When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <yucai.yu@intel.com> Closes #10265 from yucai/broadcast_hint.

Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199.

… pyspark JIRA: https://issues.apache.org/jira/browse/SPARK-12016 We should not directly use Word2VecModel in pyspark. We need to wrap it in a Word2VecModelWrapper when loading it in pyspark. Author: Liang-Chi Hsieh <viirya@appier.com> Closes #10100 from viirya/fix-load-py-wordvecmodel.

cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable.

I think it was a mistake, and we have not catched it so far until #10260 which begin to check if the `fromRowExpression` is resolved. Author: Wenchen Fan <wenchen@databricks.com> Closes #10263 from cloud-fan/encoder.

andrewor14 · 2015-12-15T00:57:57Z

@SleepyThread have you had the chance to address the comments? Are you still working on this?

… in Dataset APIs marmbrus This PR is to address your comment. Thanks for your review! Author: gatorsmile <gatorsmile@gmail.com> Closes #10214 from gatorsmile/followup12188.

Support UnsafeRow for the Coalesce/Except/Intersect. Could you review if my code changes are ok? davies Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10285 from gatorsmile/unsafeSupportCIE.

Fix a minor typo (unbalanced bracket) in ResetSystemProperties. Author: Holden Karau <holden@us.ibm.com> Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.

cc\ tdas zsxwing , please review. Thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10305 from jerryshao/fix-typo-state-impl.

SleepyThread · 2015-12-15T23:25:58Z

@andrewor14 I have addressed all the comments. Please review the commit.

andrewor14 · 2015-12-16T00:46:34Z

Can you rebase to master? Then @dragos or @tnachen can do another pass.

…n and adding test for same

…ving one test for each case

…ition and adding test for it.

SleepyThread · 2015-12-16T09:58:29Z

@andrewor14 @dragos @tnachen i screwed up the merge. I will open up a new pull request which is sync with master.

… done

SleepyThread · 2015-12-16T12:15:09Z

Please refer new pull request #10326. Closing this pull request.

tnachen reviewed Sep 16, 2015
View reviewed changes

This was referenced Sep 18, 2015

[SPARK-5095][MESOS] Support capping cores and launch mulitple executors in coar... #4027

Closed

[SPARK-10471] [CORE] [MESOS] prevent getting offers for unmet constraints #8639

Closed

tnachen reviewed Nov 12, 2015
View reviewed changes

HeartSaVioR and others added 8 commits November 24, 2015 09:20

[SPARK-11942][SQL] fix encoder life cycle for CoGroup

e5aaae6

we should pass in resolved encodera to logical `CoGroup` and bind them in physical `CoGroup` Author: Wenchen Fan <wenchen@databricks.com> Closes #9928 from cloud-fan/cogroup.

[SPARK-11952][ML] Remove duplicate ml examples

56a0aba

Remove duplicate ml examples (only for ml). mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes #9933 from yanboliang/SPARK-11685.

[SPARK-11926][SQL] unify GetStructField and GetInternalRowField

19530da

Author: Wenchen Fan <wenchen@databricks.com> Closes #9909 from cloud-fan/get-struct.

[SPARK-11872] Prevent the call to SparkContext#stop() in the listener…

8101254

… bus's thread This is continuation of SPARK-11761 Andrew suggested adding this protection. See tail of #9741 Author: tedyu <yuzhihong@gmail.com> Closes #9852 from tedyu/master.

jbonofre and others added 10 commits December 12, 2015 08:51

[SPARK-11193] Use Java ConcurrentHashMap instead of SynchronizedMap t…

03138b6

…rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes #10203 from jbonofre/SPARK-11193.

[SPARK-12199][DOC] Follow-up: Refine example code in ml-features.md

98b212d

https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes #10193 from yinxusen/SPARK-12199.

[SPARK-12267][CORE] Store the remote RpcEnv address to send the corre…

8af2f8c

…ct disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes #10261 from zsxwing/SPARK-12267.

[MINOR][DOC] Fix broken word2vec link

e25f1fe

Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and #10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes #10282 from BenFradet/SPARK-12199.

[SPARK-12327] Disable commented code lintr temporarily

fb3778d

cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes #10300 from shivaram/comment-lintr-disable.

gatorsmile and others added 4 commits December 14, 2015 18:33

[SPARK-12188][SQL][FOLLOW-UP] Code refactoring and comment correction…

d13ff82

… in Dataset APIs marmbrus This PR is to address your comment. Thanks for your review! Author: gatorsmile <gatorsmile@gmail.com> Closes #10214 from gatorsmile/followup12188.

[SPARK-12288] [SQL] Support UnsafeRow in Coalesce/Except/Intersect.

606f99b

Support UnsafeRow for the Coalesce/Except/Intersect. Could you review if my code changes are ok? davies Thank you! Author: gatorsmile <gatorsmile@gmail.com> Closes #10285 from gatorsmile/unsafeSupportCIE.

[SPARK-12332][TRIVIAL][TEST] Fix minor typo in ResetSystemProperties

c59df8c

Fix a minor typo (unbalanced bracket) in ResetSystemProperties. Author: Holden Karau <holden@us.ibm.com> Closes #10303 from holdenk/SPARK-12332-trivial-typo-in-ResetSystemProperties-comment.

[STREAMING][MINOR] Fix typo in function name of StateImpl

bc1ff9f

cc\ tdas zsxwing , please review. Thanks a lot. Author: jerryshao <sshao@hortonworks.com> Closes #10305 from jerryshao/fix-typo-state-impl.

SleepyThread changed the title ~~[SPARK-10618] [Mesos] Refactoring coarsed-grained scheduling conditio…~~ [SPARK-10618] [Mesos] Refactoring scheduling condition and adding test for same Dec 15, 2015

[SPARK-10618] [Mesos] Adressing comments on PR.

6edd12b

SleepyThread added 5 commits December 16, 2015 10:28

[SPARK-10618] [Mesos] Refactoring coarsed-grained scheduling conditio…

14af20d

…n and adding test for same

[SPARK-10618] [Mesos] Killing space and removing duplication, also ha…

ecddfe6

…ving one test for each case

[SPARK-10514] [CORE] [Mesos] Refactoring fine-grained scheduling cond…

6a9d86a

…ition and adding test for it.

[SPARK-10618] [Mesos] Adressing comments on PR.

538e089

Rebasing current master to local PR branch.

6463452

[SPARK-10618] [Mesos] removing meetconstraints check as it is already…

645cbdd

… done

SleepyThread mentioned this pull request Dec 16, 2015

[SPARK-10618] [Mesos] Refactoring scheduling condition and adding test for same #10326

Closed

SleepyThread closed this Dec 16, 2015

SleepyThread deleted the upstream-local branch February 12, 2016 13:58

		val taskScheduler = mock[TaskSchedulerImpl]
		when(taskScheduler.sc).thenReturn(sc)

[SPARK-10618] [Mesos] Refactoring scheduling condition and adding test for same #8771

[SPARK-10618] [Mesos] Refactoring scheduling condition and adding test for same #8771

Uh oh!

Conversation

SleepyThread commented Sep 15, 2015

Uh oh!

SleepyThread commented Sep 15, 2015

Uh oh!

tnachen Sep 16, 2015

Choose a reason for hiding this comment

Uh oh!

tnachen commented Sep 16, 2015

Uh oh!

dragos commented Sep 16, 2015

Uh oh!

SleepyThread commented Sep 16, 2015

Uh oh!

SleepyThread commented Sep 17, 2015

Uh oh!

tnachen commented Sep 23, 2015

Uh oh!

SleepyThread commented Sep 24, 2015

Uh oh!

tnachen commented Sep 24, 2015

Uh oh!

AmplabJenkins commented Oct 8, 2015

Uh oh!

andrewor14 commented Nov 10, 2015

Uh oh!

tnachen Nov 12, 2015

Choose a reason for hiding this comment

Uh oh!

tnachen commented Nov 12, 2015

Uh oh!

andrewor14 commented Dec 15, 2015

Uh oh!

SleepyThread commented Dec 15, 2015

Uh oh!

andrewor14 commented Dec 16, 2015

Uh oh!

SleepyThread commented Dec 16, 2015

Uh oh!

SleepyThread commented Dec 16, 2015

Uh oh!

Uh oh!