Skip to content

duf59/shiny-kmeans

Repository files navigation

Shiny application illustrating the k-means clustering method

author: Renaud DUFOUR Date: May 2015

  • Acess the application here
  • A short slidify presentation of the project here

What is k-means clustering ?

  • K-means is a distance-based method for cluster analysis in data mining
  • It enables partitioning a set of data points into groups which are as similar as possible
  • Each group, called cluster, is represented by its center

Algorithm

Given K, the number of clusters, k-means clustering works as follows:

  • Select K points as initial centroids
  • Repeat
    • Form K clusters by assigning each point to its closest centroid
    • Re-compute the centroids of each cluster
  • Until convergence criterion is satisfied
  • Different kinds of measures can be used (L1 norm, L2 norm, cosine similarity, ...)

The Shiny App

  • Illustrates K-mean clustering based on 2 datasets:
    • the R built in iris dataset
    • a dataset dat1 involving embedded clusters
  • Enables to change the following parameters:
    • dataset to be used
    • variables on which the clustering is to be performed (note: 2D clustering only)
    • number of clusters
    • type of kernel : linear or radial (RBF)
  • When using a non-linear kernel, the datapoints are first projected into the kernel space before clustering is performed.

Further improvements

  • More informations on the K-means algorithm on wikipedia. I also recommend the Cluster Analysis In Data Mining class on Coursera, which actually inspired me this app.
  • Potential improvements include :
    • using interactive graphics (rchart, googleVis)
    • computing clustering validation measures such as purity or normalized mutual information. Note that such external measures require knowing the true classes of the data points, which is the case for the 2 implemented datasets but not in general. Instead one could also consider internal measures such as Beta CV.
    • Implementing other kernels and allow user to tune kernel parameters (actually parameter of RBF kernel is internally determined using an heuristic approach)
    • Implementing alternative clustering techniques like k-medians or k-medoids
  • Feel free to contact me for any question or suggestion !

About

A shiny Application illustrating k-means clustering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published