Skip to content

[SPARK-2851] [mllib] DecisionTree Python consistency update #1798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

jkbradley
Copy link
Member

Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).

Added factory classes for Algo and Impurity, but made private[mllib].

CC: @mengxr @dorx Please let me know if there are other changes which would help with API consistency---thanks!

* Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).

Added factory classes for Algo and Impurity, but made private[mllib].
@jkbradley jkbradley changed the title DecisionTree Python consistency update [mllib] DecisionTree Python consistency update Aug 6, 2014
@jkbradley jkbradley changed the title [mllib] DecisionTree Python consistency update [SPARK-2828] [mllib] DecisionTree Python consistency update Aug 6, 2014
@jkbradley jkbradley changed the title [SPARK-2828] [mllib] DecisionTree Python consistency update [SPARK-2851] [mllib] DecisionTree Python consistency update Aug 6, 2014
@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1798. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17980/consoleFull

@@ -27,4 +27,10 @@ import org.apache.spark.annotation.Experimental
object Algo extends Enumeration {
type Algo = Value
val Classification, Regression = Value

private[mllib] def stringToAlgo(name: String): Algo = name match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: stringToAlgo -> fromString? In the code, it might be more natural to read

Algo.fromString("classification")
Algo.stringToAlgo("classification")

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1798:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17980/consoleFull

* Removed train() function in Python API (tree.py)
** Removed corresponding function in Scala/Java API (the ones taking basic types)

DecisionTree internal updates:
* Renamed Algo and Impurity factory methods to fromString()

DecisionTree doc updates:
* Added notes recommending use of trainClassifier, trainRegressor
* Say supported values for impurity
* Shortened doc for Java-friendly train* functions.
@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1798. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18037/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1798:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18037/consoleFull

* Java-friendly API for [[org.apache.spark.mllib.tree.DecisionTree$#trainClassifier]]
*/
def trainClassifier(
input: RDD[LabeledPoint],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RDD -> JavaRDD. Sorry I missed this in the first pass.

@mengxr
Copy link
Contributor

mengxr commented Aug 6, 2014

LGTM except minor inline comments.

@jkbradley
Copy link
Member Author

Hopefully all done now, after Jenkins.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1798. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18044/consoleFull

@mengxr
Copy link
Contributor

mengxr commented Aug 6, 2014

@jkbradley Could you try to merge the latest master? It seems that there are conflicts.

…ency

Conflicts:
	python/pyspark/mllib/tree.py
(not a real conflict, merged)
@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1798. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18054/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1798:
- This patch PASSES unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18044/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1798:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18054/consoleFull

asfgit pushed a commit that referenced this pull request Aug 7, 2014
Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).

Added factory classes for Algo and Impurity, but made private[mllib].

CC: mengxr dorx  Please let me know if there are other changes which would help with API consistency---thanks!

Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>

Closes #1798 from jkbradley/dt-python-consistency and squashes the following commits:

6f7edf8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
a0d7dbe [Joseph K. Bradley] DecisionTree: In Java-friendly train* methods, changed to use JavaRDD instead of RDD.
ee1d236 [Joseph K. Bradley] DecisionTree API updates: * Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types)
00f820e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
fe6dbfa [Joseph K. Bradley] removed unnecessary imports
e358661 [Joseph K. Bradley] DecisionTree API change: * Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
c699850 [Joseph K. Bradley] a few doc comments
eaf84c0 [Joseph K. Bradley] Added DecisionTree static train() methods API to match Python, but without default parameters

(cherry picked from commit 47ccd5e)
Signed-off-by: Xiangrui Meng <meng@databricks.com>
@asfgit asfgit closed this in 47ccd5e Aug 7, 2014
@mengxr
Copy link
Contributor

mengxr commented Aug 7, 2014

LGTM. Merged into both master and branch-1.1.

xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).

Added factory classes for Algo and Impurity, but made private[mllib].

CC: mengxr dorx  Please let me know if there are other changes which would help with API consistency---thanks!

Author: Joseph K. Bradley <joseph.kurata.bradley@gmail.com>

Closes apache#1798 from jkbradley/dt-python-consistency and squashes the following commits:

6f7edf8 [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
a0d7dbe [Joseph K. Bradley] DecisionTree: In Java-friendly train* methods, changed to use JavaRDD instead of RDD.
ee1d236 [Joseph K. Bradley] DecisionTree API updates: * Removed train() function in Python API (tree.py) ** Removed corresponding function in Scala/Java API (the ones taking basic types)
00f820e [Joseph K. Bradley] Merge remote-tracking branch 'upstream/master' into dt-python-consistency
fe6dbfa [Joseph K. Bradley] removed unnecessary imports
e358661 [Joseph K. Bradley] DecisionTree API change: * Added 6 static train methods to match Python API, but without default arguments (but with Python default args noted in docs).
c699850 [Joseph K. Bradley] a few doc comments
eaf84c0 [Joseph K. Bradley] Added DecisionTree static train() methods API to match Python, but without default parameters
@jkbradley jkbradley deleted the dt-python-consistency branch December 4, 2014 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants