Batched gpu retriever by kathirgounder · Pull Request #1037 · borglab/gtsfm

kathirgounder · 2026-01-30T06:26:48Z

Main Idea: Doing the similarity matrix multiplication in one shot on the GPU is about 2 seconds for the Dubrovnik Descriptors with dimension (6044, 8448). Trying to see how far I could push my 2080 TI (11GB VRAM) the max dataset I could fit was (40,000 , 8448).

So I batched the matrix multiply up and was able to scale upto 100,000 descriptors in 20 seconds. The current Similarity Retriever takes 321 seconds on a 30k image dataset. The FAISS implementation I had earlier took about 200 seconds while it missed a lot of pairs, the Index Building times were also just not worth it.

Will update this PR with more timing info and experiments

Implement BatchedGPURetriever

16fa3b8

kathirgounder force-pushed the BatchedGPURetriever branch from 4ebf898 to 16fa3b8 Compare January 30, 2026 06:35

Fixes

44e5a4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batched gpu retriever#1037

Batched gpu retriever#1037
kathirgounder wants to merge 2 commits intoborglab:masterfrom
kathirgounder:BatchedGPURetriever

kathirgounder commented Jan 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kathirgounder commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kathirgounder commented Jan 30, 2026 •

edited

Loading