-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-6518][MLlib][Example][DOC] Add example code and user guide for bisecting k-means #9952
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Jenkins, retest this please |
Test build #46657 has finished for PR 9952 at commit
|
@jkbradley could you review this PR? |
When you modify this and [https://github.com//pull/9968] to use the |
All right. I'll fix it. |
Test build #46790 has finished for PR 9952 at commit
|
@jkbradley could you review it? |
I moved the documentation to this PR from #9968. Because the PR depends on this PR. I modified the docs for |
Test build #47095 has finished for PR 9952 at commit
|
@jkbradley ping |
Thanks! I'll review today |
I'll go ahead and review this, but if you have time to add a Java example, that would be awesome--thanks! |
|
||
Bisecting k-means is a kind of [hierarchical clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering). | ||
Hierarchical clustering is one of the most commonly used method of cluster analysis which seeks to build a hierarchy of clusters. | ||
Strategies for hierarchical clustering generally fall into two types: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this line (which is getting into details), it would be good to give a high-level description of the use of bisecting k-means. E.g., "Bisecting K-means can often be much faster than regular K-means, but it will generally produce a different clustering."
That should be it. |
Also, can you please add the "[DOC]" tag to the PR title? Thanks! |
@jkbradley sure, done. |
Thanks, I'll check back for updates for the few remaining comments. |
ping @jkbradley |
@yu-iskw There are several comments of mine (above) which have not yet been addressed. Could you please update this PR based on those comments? I'll then take another look. |
Oh, I'm terribly sorry about that. Pushing the update was failed....
|
Test build #47639 has finished for PR 9952 at commit
|
|
||
import org.apache.spark.SparkConf; | ||
import org.apache.spark.api.java.JavaRDD; | ||
import org.apache.spark.api.java.JavaSparkContext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would exclude JavaSparkContext and SparkConf.
No problem. I just added some comments and am generating the docs now. |
The generated docs look good, so just those few comments. |
@jkbradley thanks for the review. I modified the wrong indentations and excluded the import statements for |
Test build #47780 has finished for PR 9952 at commit
|
LGTM, merging with master and branch-1.6 |
… bisecting k-means This PR includes only an example code in order to finish it quickly. I'll send another PR for the docs soon. Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com> Closes #9952 from yu-iskw/SPARK-6518. (cherry picked from commit 7b6dc29) Signed-off-by: Joseph K. Bradley <joseph@databricks.com>
Thank you for merging it! |
This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.