-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-2851] [mllib] DecisionTree Python consistency update #1798
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…thout default parameters
* Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs). Added factory classes for Algo and Impurity, but made private[mllib].
QA tests have started for PR 1798. This patch merges cleanly. |
@@ -27,4 +27,10 @@ import org.apache.spark.annotation.Experimental | |||
object Algo extends Enumeration { | |||
type Algo = Value | |||
val Classification, Regression = Value | |||
|
|||
private[mllib] def stringToAlgo(name: String): Algo = name match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: stringToAlgo
-> fromString
? In the code, it might be more natural to read
Algo.fromString("classification")
Algo.stringToAlgo("classification")
QA results for PR 1798: |
* Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types) DecisionTree internal updates: * Renamed Algo and Impurity factory methods to fromString() DecisionTree doc updates: * Added notes recommending use of trainClassifier, trainRegressor * Say supported values for impurity * Shortened doc for Java-friendly train* functions.
QA tests have started for PR 1798. This patch merges cleanly. |
QA results for PR 1798: |
* Java-friendly API for [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]] | ||
*/ | ||
def trainClassifier( | ||
input: RDD[LabeledPoint], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RDD
-> JavaRDD
. Sorry I missed this in the first pass.
LGTM except minor inline comments. |
Hopefully all done now, after Jenkins. |
QA tests have started for PR 1798. This patch DID NOT merge cleanly! |
@jkbradley Could you try to merge the latest master? It seems that there are conflicts. |
…ency Conflicts: python/pyspark/mllib/tree.py (not a real conflict, merged)
QA tests have started for PR 1798. This patch merges cleanly. |
QA results for PR 1798: |
QA results for PR 1798: |
Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs). Added factory classes for Algo and Impurity, but made private[mllib]. CC: mengxr dorx Please let me know if there are other changes which would help with API consistency---thanks! Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes #1798 from jkbradley/dt-python-consistency and squashes the following commits: 6f7edf8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency a0d7dbe [Joseph K. Bradley] DecisionTree: In Java-friendly train* methods, changed to use JavaRDD instead of RDD. ee1d236 [Joseph K. Bradley] DecisionTree API updates: * Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types) 00f820e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency fe6dbfa [Joseph K. Bradley] removed unnecessary imports e358661 [Joseph K. Bradley] DecisionTree API change: * Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs). c699850 [Joseph K. Bradley] a few doc comments eaf84c0 [Joseph K. Bradley] Added DecisionTree static train() methods API to match Python, but without default parameters (cherry picked from commit 47ccd5e) Signed-off-by: Xiangrui Meng <meng@databricks.com>
LGTM. Merged into both master and branch-1.1. |
Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs). Added factory classes for Algo and Impurity, but made private[mllib]. CC: mengxr dorx Please let me know if there are other changes which would help with API consistency---thanks! Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com> Closes apache#1798 from jkbradley/dt-python-consistency and squashes the following commits: 6f7edf8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency a0d7dbe [Joseph K. Bradley] DecisionTree: In Java-friendly train* methods, changed to use JavaRDD instead of RDD. ee1d236 [Joseph K. Bradley] DecisionTree API updates: * Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types) 00f820e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency fe6dbfa [Joseph K. Bradley] removed unnecessary imports e358661 [Joseph K. Bradley] DecisionTree API change: * Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs). c699850 [Joseph K. Bradley] a few doc comments eaf84c0 [Joseph K. Bradley] Added DecisionTree static train() methods API to match Python, but without default parameters
Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
Added factory classes for Algo and Impurity, but made private[mllib].
CC: @mengxr @dorx Please let me know if there are other changes which would help with API consistency---thanks!