SKIPME merged Apache branch-1.6 #130

markhamstra · 2015-12-15T17:14:37Z

No description provided.

Adding in Pipeline Import and Export Documentation. Author: anabranch <wac.chambers@gmail.com> Author: Bill Chambers <wchambers@ischool.berkeley.edu> Closes apache#10179 from anabranch/master. (cherry picked from commit aa305dc) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

…rasure Issue As noted in PR apache#9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor. As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark. Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`. Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`. As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type. `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types. This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`. This PR blocks apache#9441, so once this is merged, the other can be rebased. cc holdenk Author: Mike Dusenberry <mwdusenb@us.ibm.com> Closes apache#9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue. (cherry picked from commit 1b82203) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation. I wonder if I should also add a snippet to the code example, input welcome. Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#10257 from BenFradet/SPARK-12217. (cherry picked from commit aea676c) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

…o dataframe_example.py Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion. apache#9873 finished the work of Scala example, here we focus on the Python one. Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```. BTW, fix minor missing issues of apache#9873. cc mengxr Author: Yanbo Liang <ybliang8@gmail.com> Closes apache#9957 from yanboliang/SPARK-11978. (cherry picked from commit a0ff6d1) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

Modifies the String overload to call the Column overload and ensures this is called in a test. Author: Ankur Dave <ankurdave@gmail.com> Closes apache#10271 from ankurdave/SPARK-12298. (cherry picked from commit 1e799d6) Signed-off-by: Yin Huai <yhuai@databricks.com>

…est cases The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value. This could cause SparkR unit tests failed. For example, I hit it in another PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull Author: gatorsmile <gatorsmile@gmail.com> Closes apache#10160 from gatorsmile/sampleR. (cherry picked from commit 1e3526c) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

…rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver Author: Jean-Baptiste Onofré <jbonofre@apache.org> Closes apache#10203 from jbonofre/SPARK-11193. (cherry picked from commit 03138b6) Signed-off-by: Sean Owen <sowen@cloudera.com>

https://issues.apache.org/jira/browse/SPARK-12199 Follow-up PR of SPARK-11551. Fix some errors in ml-features.md mengxr Author: Xusen Yin <yinxusen@gmail.com> Closes apache#10193 from yinxusen/SPARK-12199. (cherry picked from commit 98b212d) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>

…ct disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#10261 from zsxwing/SPARK-12267. (cherry picked from commit 8af2f8c) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

… in the shutdown hook 1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook. 2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook. This should fix the potential exceptions when exiting a local cluster ``` java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal at scala.Predef$.assert(Predef.scala:179) at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown. at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191) at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180) at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73) at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474) at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116) at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204) at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100) at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) ``` Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#10269 from zsxwing/executor-state. (cherry picked from commit 2aecda2) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies. https://issues.apache.org/jira/browse/SPARK-12275 Author: yucai <yucai.yu@intel.com> Closes apache#10265 from yucai/broadcast_hint. (cherry picked from commit ed87f6d) Signed-off-by: Yin Huai <yhuai@databricks.com>

Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and apache#10193 where a broken link has been left as is. Author: BenFradet <benjamin.fradet@gmail.com> Closes apache#10282 from BenFradet/SPARK-12199. (cherry picked from commit e25f1fe) Signed-off-by: Sean Owen <sowen@cloudera.com>

cc yhuai felixcheung shaneknapp Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu> Closes apache#10300 from shivaram/comment-lintr-disable. (cherry picked from commit fb3778d) Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

SKIPME merged Apache branch-1.6

bllchmbrs and others added 15 commits December 11, 2015 12:56

Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6

49bd093

[SPARK-12267][CORE] Store the remote RpcEnv address to send the corre…

d7e3bfd

…ct disconnetion message Author: Shixiong Zhu <shixiong@databricks.com> Closes apache#10261 from zsxwing/SPARK-12267. (cherry picked from commit 8af2f8c) Signed-off-by: Shixiong Zhu <shixiong@databricks.com>

Merge branch 'branch-1.6' of github.com:apache/spark into csd-1.6

b0837df

markhamstra added a commit that referenced this pull request Dec 15, 2015

Merge pull request #130 from markhamstra/csd-1.6

9165dc7

SKIPME merged Apache branch-1.6

markhamstra merged commit 9165dc7 into alteryx:csd-1.6 Dec 15, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

SKIPME merged Apache branch-1.6 #130

SKIPME merged Apache branch-1.6 #130

Uh oh!

markhamstra commented Dec 15, 2015

Uh oh!

Uh oh!

SKIPME merged Apache branch-1.6 #130

SKIPME merged Apache branch-1.6 #130

Uh oh!

Conversation

markhamstra commented Dec 15, 2015

Uh oh!

Uh oh!