Skip to content

Conversation

@Likith101
Copy link
Member

@Likith101 Likith101 commented Dec 11, 2024

Currently max codes is the number of scans performed at the faiss level. Without prefiltering, every single scan leads to a computation as well.

With prefiltering however, the logic is slightly different. Max codes is reduced from a percent of the total number of vectors to a percent of the filtered documents. This implies that we scan through max codes percent of the filtered vectors. Looking into the faiss code paths, this is not the case. We scan through max codes number of vectors and then apply the filtered vectors first before the distance computation. This means that max codes takes precedence over filtering and there is a very high possibility of not scanning the filtered vectors.

This fix removes the dependency between prefiltering and max codes by not reducing the max codes number from total vectors length to filtered vector length.

@Likith101 Likith101 changed the title Removing modification of max codes based on filtered document size MB-64513: Removing modification of max codes based on filtered document size Dec 16, 2024
@Likith101 Likith101 merged commit 2127bb0 into master Dec 17, 2024
@metonymic-smokey
Copy link
Member

I will update the design doc to reflect this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants