[enhancement
] Improve efficiency of community detection on GPU
#2381
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Supersedes #1857
Closes #1654, Closes #1840, Closes #1703
Hello!
Pull Request overview
Details
I noticed that running
community_detection
on GPU was barely faster than on CPU. I chased this down, and it's due to the large amount of looping rather than taking good advantage oftorch
and the GPU's strengths. The new implementation uses torch operations much more, and will be notably faster when the embeddings are on GPU.However, the implementation performs slightly worse than the
master
implementation for CPU. As a result, (a slight variation of) the original implementation is still used for CPU, with one exception:sentence-transformers/sentence_transformers/util.py
Lines 389 to 395 in 3db309a
This loop has been replaced with:
Which is slightly more performant.
Benchmarks
Note
The computation time is still exponential! This is simply due to how all N embeddings must be compared with all N other embeddings. In short, this PR does not allow clustering on any amount of embeddings, but it does allow it on a much larger amount.