Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 API for CAGRA and IVF-PQ #264

Merged
merged 19 commits into from
Sep 27, 2024
Merged

Conversation

tfeher
Copy link
Contributor

@tfeher tfeher commented Jul 30, 2024

This PR adds public API to CAGRA and IVF-PQ ANN search using FP16 input data.

Note that the fp16 kernels are already compiled in libcuvs. This PR just adds the missing public API declarations into the C++ headers, and restores the instantiations of public API functions.

This PR partially fixes #144 (the IVF-Flat API is not yet added here).

@tfeher tfeher requested review from a team as code owners July 30, 2024 22:02
@cjnolet
Copy link
Member

cjnolet commented Jul 30, 2024

We also need to measure the impact of binary size and compile time for adding these new types to the public API.

We can't keep increasing without figuring out ways we can consolidate what's there. These two thjngs are the number 1 complaint from users currently. (It's not just this PR. This is also holding up the half precision bfknn and RBC PRs).

@tfeher tfeher added bug Something isn't working non-breaking Introduces a non-breaking change labels Jul 30, 2024
@cjnolet
Copy link
Member

cjnolet commented Jul 30, 2024

Linking #110

@tfeher
Copy link
Contributor Author

tfeher commented Jul 30, 2024

The kernels are already compiled and included in libcuvs.so. The additional instantiations of the API entry points shall have negligible size. I will confirm this once CI finishes.

It was a mistake during porting the code from RAFT, that the public API was not defined for fp16, therefore I labelled this PR as a bugfix.

@tfeher tfeher requested a review from achirkin July 30, 2024 23:43
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tamas for the PR! The changes are straightforward and it looks good to me as-is, yet a few nitpicks below (if the time permits).

Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Artem for the review, I have addressed the issues.

@cjnolet
Copy link
Member

cjnolet commented Jul 31, 2024

Sorry @tfeher, I understand that there were some bits which were not exposed in the prior port, but this still doesn't change the increase in binary size. We need to address this before we continue to merge changes that increase it.

I propose we look at things that can be consolidated and maybe try using JIT for some of these things. We have to take a step back and fix this at the source.

@tfeher tfeher requested a review from a team as a code owner August 1, 2024 00:44
@achirkin achirkin changed the base branch from branch-24.08 to branch-24.10 September 27, 2024 08:03
@achirkin
Copy link
Contributor

achirkin commented Sep 27, 2024

Binary size: 679 -> 661 MB
The size is actually reduced, because I removed a few unused template instances and most of FP16-related instances were already in the binary (unused until this PR).

A quick check with bench-ann shows the FP16 benchmarks seem to work. I've checked that CAGRA is a little bit faster on FP16 vs FP32 on the glove dataset.

@achirkin achirkin mentioned this pull request Sep 27, 2024
@achirkin
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit c616a22 into rapidsai:branch-24.10 Sep 27, 2024
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working CMake cpp non-breaking Introduces a non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

[FEA] Support fp16 input type for vector search
4 participants