Skip to content

SKIPME merged Apache branch-1.6 #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Dec 15, 2015
Merged

Conversation

markhamstra
Copy link

No description provided.

bllchmbrs and others added 15 commits December 11, 2015 12:56
Adding in Pipeline Import and Export Documentation.

Author: anabranch <wac.chambers@gmail.com>
Author: Bill Chambers <wchambers@ischool.berkeley.edu>

Closes apache#10179 from anabranch/master.

(cherry picked from commit aa305dc)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…rasure Issue

As noted in PR apache#9441, implementing `tallSkinnyQR` uncovered a bug with our PySpark `RowMatrix` constructor.  As discussed on the dev list [here](http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html), there appears to be an issue with type erasure with RDDs coming from Java, and by extension from PySpark.  Although we are attempting to construct a `RowMatrix` from an `RDD[Vector]` in [PythonMLlibAPI](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115), the `Vector` type is erased, resulting in an `RDD[Object]`.  Thus, when calling Scala's `tallSkinnyQR` from PySpark, we get a Java `ClassCastException` in which an `Object` cannot be cast to a Spark `Vector`.  As noted in the aforementioned dev list thread, this issue was also encountered with `DecisionTrees`, and the fix involved an explicit `retag` of the RDD with a `Vector` type.  `IndexedRowMatrix` and `CoordinateMatrix` do not appear to have this issue likely due to their related helper functions in `PythonMLlibAPI` creating the RDDs explicitly from DataFrames with pattern matching, thus preserving the types.

This PR currently contains that retagging fix applied to the `createRowMatrix` helper function in `PythonMLlibAPI`.  This PR blocks apache#9441, so once this is merged, the other can be rebased.

cc holdenk

Author: Mike Dusenberry <mwdusenb@us.ibm.com>

Closes apache#9458 from dusenberrymw/SPARK-11497_PySpark_RowMatrix_Constructor_Has_Type_Erasure_Issue.

(cherry picked from commit 1b82203)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Added a paragraph regarding StringIndexer#setHandleInvalid to the ml-features documentation.

I wonder if I should also add a snippet to the code example, input welcome.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes apache#10257 from BenFradet/SPARK-12217.

(cherry picked from commit aea676c)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…o dataframe_example.py

Since ```Dataset``` has a new meaning in Spark 1.6, we should rename it to avoid confusion.
apache#9873 finished the work of Scala example, here we focus on the Python one.
Move dataset_example.py to ```examples/ml``` and rename to ```dataframe_example.py```.
BTW, fix minor missing issues of apache#9873.
cc mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes apache#9957 from yanboliang/SPARK-11978.

(cherry picked from commit a0ff6d1)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Modifies the String overload to call the Column overload and ensures this is called in a test.

Author: Ankur Dave <ankurdave@gmail.com>

Closes apache#10271 from ankurdave/SPARK-12298.

(cherry picked from commit 1e799d6)
Signed-off-by: Yin Huai <yhuai@databricks.com>
…est cases

The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.

This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#10160 from gatorsmile/sampleR.

(cherry picked from commit 1e3526c)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
…rait in order to avoid ClassCastException due to KryoSerializer in KinesisReceiver

Author: Jean-Baptiste Onofré <jbonofre@apache.org>

Closes apache#10203 from jbonofre/SPARK-11193.

(cherry picked from commit 03138b6)
Signed-off-by: Sean Owen <sowen@cloudera.com>
https://issues.apache.org/jira/browse/SPARK-12199

Follow-up PR of SPARK-11551. Fix some errors in ml-features.md

mengxr

Author: Xusen Yin <yinxusen@gmail.com>

Closes apache#10193 from yinxusen/SPARK-12199.

(cherry picked from commit 98b212d)
Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
…ct disconnetion message

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#10261 from zsxwing/SPARK-12267.

(cherry picked from commit 8af2f8c)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
… in the shutdown hook

1. Make sure workers and masters exit so that no worker or master will still be running when triggering the shutdown hook.
2. Set ExecutorState to FAILED if it's still RUNNING when executing the shutdown hook.

This should fix the potential exceptions when exiting a local cluster
```
java.lang.AssertionError: assertion failed: executor 4 state transfer from RUNNING to RUNNING is illegal
	at scala.Predef$.assert(Predef.scala:179)
	at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:260)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

java.lang.IllegalStateException: Shutdown hooks cannot be modified during shutdown.
	at org.apache.spark.util.SparkShutdownHookManager.add(ShutdownHookManager.scala:246)
	at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:191)
	at org.apache.spark.util.ShutdownHookManager$.addShutdownHook(ShutdownHookManager.scala:180)
	at org.apache.spark.deploy.worker.ExecutorRunner.start(ExecutorRunner.scala:73)
	at org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:474)
	at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:116)
	at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
	at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
	at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
```

Author: Shixiong Zhu <shixiong@databricks.com>

Closes apache#10269 from zsxwing/executor-state.

(cherry picked from commit 2aecda2)
Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
When SparkStrategies.BasicOperators's "case BroadcastHint(child) => apply(child)" is hit, it only recursively invokes BasicOperators.apply with this "child". It makes many strategies have no change to process this plan, which probably leads to "No plan" issue, so we use planLater to go through all strategies.

https://issues.apache.org/jira/browse/SPARK-12275

Author: yucai <yucai.yu@intel.com>

Closes apache#10265 from yucai/broadcast_hint.

(cherry picked from commit ed87f6d)
Signed-off-by: Yin Huai <yhuai@databricks.com>
Follow-up of [SPARK-12199](https://issues.apache.org/jira/browse/SPARK-12199) and apache#10193 where a broken link has been left as is.

Author: BenFradet <benjamin.fradet@gmail.com>

Closes apache#10282 from BenFradet/SPARK-12199.

(cherry picked from commit e25f1fe)
Signed-off-by: Sean Owen <sowen@cloudera.com>
cc yhuai felixcheung shaneknapp

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes apache#10300 from shivaram/comment-lintr-disable.

(cherry picked from commit fb3778d)
Signed-off-by: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
markhamstra added a commit that referenced this pull request Dec 15, 2015
SKIPME merged Apache branch-1.6
@markhamstra markhamstra merged commit 9165dc7 into alteryx:csd-1.6 Dec 15, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.