-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-13784][ML] Persistence for RandomForestClassifier, RandomForestRegressor #12118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ated current numTrees val
607ed4e
to
b13b11f
Compare
Test build #54727 has finished for PR 12118 at commit
|
@GayathriMurali @hhbyyh Could you please help review this PR? Thanks! |
Test build #54730 has finished for PR 12118 at commit
|
That seems like a spurious or unrelated failure. I'll retest |
Test build #54728 has finished for PR 12118 at commit
|
Test build #2727 has finished for PR 12118 at commit
|
sql.createDataFrame(treesMetadataJson).toDF("treeID", "metadata") | ||
.write.parquet(treesMetadataPath) | ||
val dataPath = new Path(path, "data").toString | ||
val nodeDataRDD = sql.sparkContext.parallelize(instance.trees.zipWithIndex).flatMap { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it alright to use flatMap to combine RDDs? Can we use sparkContext.union instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a single RDD. The flatMap maps every element of the original RDD to multiple elements in a new RDD. It should be fine.
@jkbradley Thanks for this. This looks great and clarifies a lot of things I was trying to do. I had one minor comment, except that it looks fine to me. |
@GayathriMurali Thanks for taking a look! @hhbyyh It would be great if you could take a look too since you're familiar with some of this code. |
Sorry for the delay. I'll start now. |
} | ||
|
||
/** | ||
* :: Experimental :: | ||
* [[http://en.wikipedia.org/wiki/Random_forest Random Forest]] model for regression. | ||
* It supports both continuous and categorical features. | ||
* @param _trees Decision trees in the ensemble. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indent correctly?
Similar for a few lines next.
just some minor comments. LGTM. |
Test build #54828 has finished for PR 12118 at commit
|
Test build #54836 has finished for PR 12118 at commit
|
@GayathriMurali Thanks for the initial commits, and @hhbyyh thanks for the review! I'll go ahead and merge this with master since you gave a LGTM pending minor changes. |
What changes were proposed in this pull request?
Main change: Added save/load for RandomForestClassifier, RandomForestRegressor (implementation details below)
Modified numTrees method (deprecation)
Minor items
Implementation details
How was this patch tested?
Adds standard unit tests for RF save/load