-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce native Readers read flatValues directly from faiss file. #2267
base: main
Are you sure you want to change the base?
Introduce native Readers read flatValues directly from faiss file. #2267
Conversation
… faiss file Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
… faiss file Signed-off-by: luyuncheng <luyuncheng@bytedance.com>
|
||
@Override | ||
public void checkIntegrity() throws IOException { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentionally kept no-op?
return fieldFileMap.containsKey(field) && fieldMetaMap.containsKey(field); | ||
} | ||
|
||
private MetaInfo read_index_header(IndexInput in) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: camelCase?
continue; | ||
} | ||
// TODO if not exist file, change to lucene flatVector | ||
IndexInput in = state.directory.openInput(vectorIndexFileName, state.context.withRandomAccess()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if we want to give access to entire file through flat vectors reader. Is it possible to limit the access to using IndexInput#slice and then storing it in the map?
this.metaInfo = metaInfo; | ||
this.slice = input.clone(); | ||
this.similarityFunction = getVectorSimilarityFunction(metaInfo.metricType).getVectorSimilarityFunction(); | ||
this.flatVectorsScorer = FlatVectorScorerUtil.getLucene99FlatVectorsScorer(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
@Override | ||
public float[] vectorValue() throws IOException { | ||
if (ord % BUCKET_VECTORS == 0) { | ||
readBucketVectors(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to understand the thought process here, why do we need to read 64 vectors at once? we can seek to the position based on ordinal and read the required vector right?
import java.io.IOException; | ||
|
||
/** Reads vectors from faiss index. like Lucene KnnVectorsReader without search */ | ||
public abstract class FaissEngineKnnVectorsReader implements Closeable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why a new interface was created, why not use existing FlatVectorReader? This way we can create a KNNFlatVectorFormat and encapsulate when to use FAISS vs Lucene flat readers logic at one place. That removes the need for methods like isNativeVectors(String field)
to be exposed and called by clients
Same goes for writer, instead of having if else condition in NativeEngines990KnnVectorsWriter#flush we can abstract the if else in the codec writer
Please fix the CI
|
@luyuncheng For #2215 i am looking into reading FAISS files for exact search, Will you be working on review comments provided by @shatejas ? Please let me know. I can pick up just reading as of now and use it for my feature if you don't have bandwidth to work on it. Thank you |
ISSUE: #2266
Faiss code:
faiss/impl/index_write.cpp
AND IndexHeader like:
Example for HNSW32,Flat, the whole file like:
so we can read the file from directly like this PR