Skip to content

This is a pytorch implementation of k-means clustering algorithm

License

Notifications You must be signed in to change notification settings

DeMoriarty/fast_pytorch_kmeans

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fast Pytorch Kmeans

this is a pytorch implementation of K-means clustering algorithm

Installation

pip install fast-pytorch-kmeans

Quick Start

from fast_pytorch_kmeans import KMeans
import torch

kmeans = KMeans(n_clusters=8, mode='euclidean', verbose=1)
x = torch.randn(100000, 64, device='cuda')
labels = kmeans.fit_predict(x)

Speed Comparison

Tested on google colab with Intel(R) Xeon(R) CPU @ 2.00GHz and Nvidia Tesla T4 GPU

sklearn: sklearn.cluster.KMeans

  • n_init = 1
  • max_iter = 100
  • tol = -1 (to force 100 iterations)

faiss: faiss.Clustering

  • nredo = 1
  • niter = 100
  • max_point_per_centroid = 10**9 (to prevent subsample from dataset)

note: time cost for transfering data from cpu to gpu is also included

fast-pytorch: fast_pytorch_kmeans.KMeans

  • max_iter = 100
  • tol = -1 (to force 100 iterations)
  • minibatch = None

1. n_samples=100,000, n_features=256, time spent for 100 iterations

2. n_samples=100,000, n_clusters=256, time spent for 100 iterations

3. n_features=256, n_clusters=256, time spent for 100 iterations

4. n_features=32, n_clusters=1024, time spent for 100 iterations

5. n_features=1024, n_clusters=32, time spent for 100 iterations

About

This is a pytorch implementation of k-means clustering algorithm

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages