-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-29142][PYTHON][ML] Pyspark clustering models support column setters/getters/predict #25859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tters/getters/predict
|
Test build #111028 has finished for PR 25859 at commit
|
| Predict label for the given features. | ||
| """ | ||
| return self._call_java("predict", value) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should add def predictProbability as well
| >>> model.getDistanceMeasure() | ||
| 'euclidean' | ||
| >>> model.setPredictionCol("newPrediction") | ||
| KMeans... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KMeansModel...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is KMeans_3487dfaa7c0e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This maybe a little bug, KMeans & KMeansModel should have different uid (like LogisticRegression & LogisticRegressionModel). This issue seems also happen in other place. But we can leave it here.
|
Test build #111300 has finished for PR 25859 at commit
|
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this does is refactor the getters, right? the description says it adds setters, but I'm not seeing that at a first look.
Also it's a tricky question, what to do about @since annotations when subclasses add them at different times. I might suggest making it the highest version of any of the methods that are removed in favor of a new one.
|
@srowen after the PR: Since currently, It will be exactly the same as the currently scala code below: I agree with you that we should retain |
|
OK so this will need to be followed up with another PR. That's fine, just remind me to review it. (We can link the JIRAs too to make it clear.) I'll leave it open for more comments for a bit. |
|
One more thing: we may need to rename |
|
PS @zhengruifeng I think you can merge this, and the other related PR, once you are both comfortable with it. |
|
Test build #111438 has finished for PR 25859 at commit
|
|
Merged to master, thanks all! |
|
Thanks! @srowen @zhengruifeng |
What changes were proposed in this pull request?
Add the following Params classes in Pyspark clustering
GaussianMixtureParamsKMeansParamsBisectingKMeansParamsLDAParamsPowerIterationClusteringParamsWhy are the changes needed?
To be consistent with scala side
Does this PR introduce any user-facing change?
Yes. Add the following changes:
How was this patch tested?
Add doctests