Skip to content

Batched gpu retriever#1037

Draft
kathirgounder wants to merge 2 commits intoborglab:masterfrom
kathirgounder:BatchedGPURetriever
Draft

Batched gpu retriever#1037
kathirgounder wants to merge 2 commits intoborglab:masterfrom
kathirgounder:BatchedGPURetriever

Conversation

@kathirgounder
Copy link
Collaborator

@kathirgounder kathirgounder commented Jan 30, 2026

Main Idea: Doing the similarity matrix multiplication in one shot on the GPU is about 2 seconds for the Dubrovnik Descriptors with dimension (6044, 8448). Trying to see how far I could push my 2080 TI (11GB VRAM) the max dataset I could fit was (40,000 , 8448).

So I batched the matrix multiply up and was able to scale upto 100,000 descriptors in 20 seconds. The current Similarity Retriever takes 321 seconds on a 30k image dataset. The FAISS implementation I had earlier took about 200 seconds while it missed a lot of pairs, the Index Building times were also just not worth it.

Will update this PR with more timing info and experiments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant