-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Milvus embedding search with filtering does not work in the first 5-10 minutes #37098
Comments
@kiranchitturi quick questions:
/assign @kiranchitturi |
in case you don't know how to collect the milvus logs: |
Thanks for the quick reply @yanliang567!
The whole dataset was inserted just few minutes ago. So, this happens after collection is created, data is inserted and for first 5-10 mins filtered data is irrelevant
What's surprising to me is that the queries work fine after 10 mins and no issues |
I don't think that really make sense.
|
@xiaofan-luan I am able to reproduce it consistently now with below sample code (dummy data) and version 2.4.4. Unfortunately, we are stuck on 2.4.4 for now till we can upgrade our clusters to latest bug fix version
returns
The code returned |
the above returned document shouldn't have that score. If I filter on that exact document, the relevancy is very low
|
Interestingly, if I update the filter to
The score is what I would expect from No of |
@yanliang567 could you try to reproduce this issue on later milvus version? |
@kiranchitturi quick questions:
|
It proves to be a pymilvus client issue. I can reproduce it with pymilvus client scripts, while i can not reproduce it with pymilvus orm scripts. /assign @XuanYang-cn |
@yanliang567 good catch on the duplication, there was a typo there. were you able to reproduce it with the latest version and fixing the duplicates issue? Do you think it happens due to the consistency level? |
I think it is due to the consistency level. In case it blocks you on poc, you can use pymilvus orm instead for now. |
I will try with that. It's interesting that it's a client issue or a server issue. Do you think it's an issue with jdbc calls vs rest api calls? |
How this related to consistency? filter is doc_type == 'type_2' but they get doc_type = 'type_1' |
create_collection(client, "quick_test_v0", 768) I think this is definitely a bug here |
I think this is related to how you generate your data, it's better to check what's actually in your database. I actaully didn't heard from any feedback that there is any strange bug like this so my my suggestion is to carefully check your scripts. also we don't recommend to use random embeddings for recall test "embeddings": [ random.uniform(-1, 1) for _ in range(768) ] because it doesn't really match the real world use case |
Is it better to load the collection after bulk indexing the data or creating the collection? |
I have used this only for replicating the bug bcoz I can't share my real scripts |
you have to load the collection after creating index, or milvus returns errors. |
I'm doubting there is some other bugs in your code so please check on it carefully. |
Seems not a pymilvus issue |
@kiranchitturi any chance that you had tried on latest pymilvus and milvus? |
Is there an existing issue for this?
Environment
Current Behavior
I am seeing a weird behavior where for the first 5-10 minutes after the collection is created, the embedding search with scalar filtering returns irrelevant results (results that do not match the filtering criteria)
This resolves after some time (5-10) mins. What causes this behavior and how to remediate this?
I have seen this issue in standalone and deployed cluster too
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: