-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge or Add data into "ondisk" indices corrupts index ids #3498
Comments
About the merge_into inconsistency: this is because the baseline_index (that manages index.ivfdata) is opened twice, once as baseline_index and once as empty_trained in add_data. Therefore there is an inconsistency between the ivfdata content and the index in one of the two cases. |
Thanks for checking on this @mdouze. I dont think the baseline index is what is causing the issue. I have simplified the test a lot, removed baseline indices, and Im using only bug-report-faiss180-ondisk-update-v2.tar.gz Reproducing the issue is, in fact, very easy. Just open any ondisk index, and just add data to it using any preferred strategy. Then check the ids inside the index. About 13% of indices will get a negative values after insertion. There is another report on the same issue. Also, this issue is happening in my application by just loading a single healthy "ondisk" index and adding data to it. About the add_index being slow, I understand that, but we do not have low latency requirement for adding data right now. I have no preference on the method for adding new data, as long as I can add new data to the on disk index. I want to help investigating / solving this issue if possible. I understand that you are quite packed with other requests, so please let me know if I can help you creating some other tests to reproduce / debug. Thanks |
Summary
When using on disk inverted lists indices (with ivfdata file), any of the methods:
will make indices corrupt.
Platform
Linux Ubuntu 22.04
Faiss version: 1.8.0
Installed from: pip
Faiss compilation options: n/a
Running on:
Interface:
Reproduction instructions
The tar.gz file has a self contained test that reproduces the problem, as well as a README explaining the issue and the test.
What the sample code does is to basically try to add data in a ondisk index using 3 methods, using inmemory index as baseline. In memory operations always succeed, while on disk operations always fail.
bug-report-faiss180-ondisk-update.tar.gz
Extract the below file and run the test:
The text was updated successfully, but these errors were encountered: