Improved result matrix stacking for Hamming GPU implementation #617
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@grst Recently I tried the GPU implementation of the Hamming distance metric with 8 million cells of the Omniscope Covid dataset, which is the largest dataset that is currently available. I noticed performance problems with stacking the blocks of the final result matrix that are computed by GPU. Therefore I implemented a numba function to stack the blocks efficiently - stacking only takes 6 seconds now for 8 million cells with 1 CPU.
With the new stacking implementation I was able to run 8 million cells in ~210 seconds with an Nvidia A30 GPU at the cluster with the GPU hamming metric (~2000 seconds with 64 CPUs with the CPU hamming metric).