Algorithm

Shiny application illustrating the k-means clustering method

author: Renaud DUFOUR Date: May 2015

What is k-means clustering ?

K-means is a distance-based method for cluster analysis in data mining
It enables partitioning a set of data points into groups which are as similar as possible
Each group, called cluster, is represented by its center

Given K, the number of clusters, k-means clustering works as follows:

Select K points as initial centroids
Repeat
- Form K clusters by assigning each point to its closest centroid
- Re-compute the centroids of each cluster
Until convergence criterion is satisfied
Different kinds of measures can be used (L1 norm, L2 norm, cosine similarity, ...)

Illustrates K-mean clustering based on 2 datasets:
- the R built in iris dataset
- a dataset dat1 involving embedded clusters
Enables to change the following parameters:
- dataset to be used
- variables on which the clustering is to be performed (note: 2D clustering only)
- number of clusters
- type of kernel : linear or radial (RBF)
When using a non-linear kernel, the datapoints are first projected into the kernel space before clustering is performed.

More informations on the K-means algorithm on wikipedia. I also recommend the Cluster Analysis In Data Mining class on Coursera, which actually inspired me this app.
Potential improvements include :
- using interactive graphics (rchart, googleVis)
- computing clustering validation measures such as purity or normalized mutual information. Note that such external measures require knowing the true classes of the data points, which is the case for the 2 implemented datasets but not in general. Instead one could also consider internal measures such as Beta CV.
- Implementing other kernels and allow user to tune kernel parameters (actually parameter of RBF kernel is internally determined using an heuristic approach)
- Implementing alternative clustering techniques like k-medians or k-medoids
Feel free to contact me for any question or suggestion !

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.Rproj.user/3D8B6E27		.Rproj.user/3D8B6E27
data		data
presentation		presentation
shinyapps/duf59		shinyapps/duf59
.gitignore		.gitignore
README.md		README.md
server.R		server.R
shiny-kmeans.Rproj		shiny-kmeans.Rproj
ui.R		ui.R