Speed up KMeans using FAISS #111

lukegre · 2024-09-20T10:13:03Z

lukegre
Sep 20, 2024

I know that clustering probably isn't the biggest bottleneck in the process, but I've used FAISS in the past, which is a lot faster.
This provides a few more details: https://www.kdnuggets.com/2021/01/k-means-faster-lower-error-scikit-learn.html
This might make searching for the number of optimal clusters quite a bit quicker (20 x if the article is anything to go by).

ArcticSnow · 2024-09-23T08:52:44Z

ArcticSnow
Sep 23, 2024
Maintainer

Thank you @lukegre for the suggestion. This looks interesting indeed! As you said the biggest bottleneck isn't quite the clustering step. If you feel like it you are welcome to push a PR adding this method.

2 replies

lukegre Sep 27, 2024
Author

Did a quick test, FAISS is much faster than regular Kmeans, but not than MiniBatch. So, probably not worth the effort.

ArcticSnow Sep 28, 2024
Maintainer

Good. As you had pointed out, clustering is not the bottleneck. But thanks for suggesting and testing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up KMeans using FAISS #111

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Speed up KMeans using FAISS #111

lukegre Sep 20, 2024

Replies: 1 comment · 2 replies

ArcticSnow Sep 23, 2024 Maintainer

lukegre Sep 27, 2024 Author

ArcticSnow Sep 28, 2024 Maintainer

lukegre
Sep 20, 2024

Replies: 1 comment 2 replies

ArcticSnow
Sep 23, 2024
Maintainer

lukegre Sep 27, 2024
Author

ArcticSnow Sep 28, 2024
Maintainer