Skip to content

k-prototypes for numerical and categorical clustering

License

Notifications You must be signed in to change notification settings

jojolebarjos/kprototypes

Repository files navigation

K-prototypes

K-prototypes, as introduced by Huang (1997), is an extension to the k-means algorithm, which handles mixed numerical and categorical data.

Also, for completeness, note that a well-known Python implementation is available here.

Installation

Install from source, to ensure latest version:

pip install git+https://github.com/jojolebarjos/kprototypes.git

Some examples require UMAP for dimensionality reduction and matplotlib for rendering:

pip install matplotlib umap-learn

References

  1. Clustering large data sets with mixed numeric and categorical values, 1997, Zhexue Huang
  2. Extensions to the k-modes algorithm for clustering large data sets with categorical values, 1998, Zhexue Huang
  3. A new initialization method for categorical data clustering, 2009, Fuyuan Cao, Jiye Liang, Liang Bai
  4. A Novel Cluster Center Initialization Method for the k-Prototypes Algorithms using Centrality and Distance, 2015, Jinchao Ji, Wei Pang, Yanlin Zheng, Zhe Wang, Zhiqiang Ma and Libiao Zhang

Changelog

  • 0.1.3 - 2024-03-30
    • Migrated to GitHub
  • 0.1.2 - 2020-12-04
    • Add proper documentation
    • Small fixes
  • 0.1.1 - 2020-06-03
    • Add clean initialization procedures
  • 0.1.0 - 2020-05-04
    • Initial version