layout	title	displayTitle
global	Clustering - spark.ml	Clustering - spark.ml

In this section, we introduce the pipeline API for clustering in mllib.

Table of Contents

This will become a table of contents (this text will be scraped). {:toc}

K-means

k-means is one of the most commonly used clustering algorithms that clusters the data points into a predefined number of clusters. The MLlib implementation includes a parallelized variant of the k-means++ method called kmeans||.

KMeans is implemented as an Estimator and generates a KMeansModel as the base model.

Input Columns

Param name	Type(s)	Default	Description
featuresCol	Vector	"features"	Feature vector

Output Columns

Param name	Type(s)	Default	Description
predictionCol	Int	"prediction"	Predicted cluster center

Example

Refer to the [Scala API docs](api/scala/index.html#org.apache.spark.ml.clustering.KMeans) for more details.

{% include_example scala/org/apache/spark/examples/ml/KMeansExample.scala %}

Refer to the [Java API docs](api/java/org/apache/spark/ml/clustering/KMeans.html) for more details.

{% include_example java/org/apache/spark/examples/ml/JavaKMeansExample.java %}

Latent Dirichlet allocation (LDA)

LDA is implemented as an Estimator that supports both EMLDAOptimizer and OnlineLDAOptimizer, and generates a LDAModel as the base models. Expert users may cast a LDAModel generated by EMLDAOptimizer to a DistributedLDAModel if needed.

Refer to the Scala API docs for more details.

{% include_example scala/org/apache/spark/examples/ml/LDAExample.scala %}

Refer to the Java API docs for more details.

{% include_example java/org/apache/spark/examples/ml/JavaLDAExample.java %}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-clustering.md

ml-clustering.md

K-means

Input Columns

Output Columns

Example

Latent Dirichlet allocation (LDA)

Files

ml-clustering.md

Latest commit

History

ml-clustering.md

File metadata and controls

K-means

Input Columns

Output Columns

Example

Latent Dirichlet allocation (LDA)