Skip to content

Improved result matrix stacking for Hamming GPU implementation #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

felixpetschko
Copy link
Collaborator

@felixpetschko felixpetschko commented Jun 1, 2025

@grst Recently I tried the GPU implementation of the Hamming distance metric with 8 million cells of the Omniscope Covid dataset, which is the largest dataset that is currently available. I noticed performance problems with stacking the blocks of the final result matrix that are computed by GPU. Therefore I implemented a numba function to stack the blocks efficiently - stacking only takes 6 seconds now for 8 million cells with 1 CPU.
With the new stacking implementation I was able to run 8 million cells in ~210 seconds with an Nvidia A30 GPU at the cluster with the GPU hamming metric (~2000 seconds with 64 CPUs with the CPU hamming metric).

@grst grst added the run-gpu-ci runs GPU CI label Jun 2, 2025
@grst
Copy link
Collaborator

grst commented Jun 2, 2025

Hi @felixpetschko,

that's great, thank you!

Would you mind updating the changelog?

LGTM otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-gpu-ci runs GPU CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants