-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP16 API for CAGRA and IVF-PQ #264
Conversation
We also need to measure the impact of binary size and compile time for adding these new types to the public API. We can't keep increasing without figuring out ways we can consolidate what's there. These two thjngs are the number 1 complaint from users currently. (It's not just this PR. This is also holding up the half precision bfknn and RBC PRs). |
Linking #110 |
The kernels are already compiled and included in It was a mistake during porting the code from RAFT, that the public API was not defined for fp16, therefore I labelled this PR as a bugfix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Tamas for the PR! The changes are straightforward and it looks good to me as-is, yet a few nitpicks below (if the time permits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Artem for the review, I have addressed the issues.
Sorry @tfeher, I understand that there were some bits which were not exposed in the prior port, but this still doesn't change the increase in binary size. We need to address this before we continue to merge changes that increase it. I propose we look at things that can be consolidated and maybe try using JIT for some of these things. We have to take a step back and fix this at the source. |
Binary size: 679 -> 661 MB A quick check with bench-ann shows the FP16 benchmarks seem to work. I've checked that CAGRA is a little bit faster on FP16 vs FP32 on the glove dataset. |
…his change breaking the code
/merge |
This PR adds public API to CAGRA and IVF-PQ ANN search using FP16 input data.
Note that the fp16 kernels are already compiled in libcuvs. This PR just adds the missing public API declarations into the C++ headers, and restores the instantiations of public API functions.
This PR partially fixes #144 (the IVF-Flat API is not yet added here).