Improved result matrix stacking for Hamming GPU implementation #617

felixpetschko · 2025-06-01T14:08:29Z

@grst Recently I tried the GPU implementation of the Hamming distance metric with 8 million cells of the Omniscope Covid dataset, which is the largest dataset that is currently available. I noticed performance problems with stacking the blocks of the final result matrix that are computed by GPU. Therefore I implemented a numba function to stack the blocks efficiently - stacking only takes 6 seconds now for 8 million cells with 1 CPU.
With the new stacking implementation I was able to run 8 million cells in ~210 seconds with an Nvidia A30 GPU at the cluster with the GPU hamming metric (~2000 seconds with 64 CPUs with the CPU hamming metric).

for more information, see https://pre-commit.ci

grst · 2025-06-02T08:35:57Z

Hi @felixpetschko,

that's great, thank you!

Would you mind updating the changelog?

LGTM otherwise.

felixpetschko and others added 2 commits June 1, 2025 15:54

additional numba implementation for gpu result matrix stacking

0329328

[pre-commit.ci] auto fixes from pre-commit.com hooks

517a3ab

for more information, see https://pre-commit.ci

grst added the run-gpu-ci runs GPU CI label Jun 2, 2025

Merge branch 'main' into gpu_hamming

3d493cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improved result matrix stacking for Hamming GPU implementation #617

Improved result matrix stacking for Hamming GPU implementation #617

Uh oh!

felixpetschko commented Jun 1, 2025 •

edited

Loading

Uh oh!

grst commented Jun 2, 2025

Uh oh!

Uh oh!

Improved result matrix stacking for Hamming GPU implementation #617

Are you sure you want to change the base?

Improved result matrix stacking for Hamming GPU implementation #617

Uh oh!

Conversation

felixpetschko commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grst commented Jun 2, 2025

Uh oh!

Uh oh!

felixpetschko commented Jun 1, 2025 •

edited

Loading