-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Latency spikes were observed when deleting 60% entities during concurrent searches #37413
Comments
/assign @zhagnlu |
|
When the search latency increasing, the following search segments operation is applied on target querynode.
The new incoming search operation
|
By adding new log and attach trace_id into logs. Found that high time cost at delete.
|
That is a great find |
or if the delete batch is really huge, a simple BF might be needed |
@zhagnlu If I got you correct, the delete request itself did NOT affect the search or hold any lock. The huge delete data DID? |
yes, not lock made it but just delted data itself |
What if we fully refactor the delete factor? instead of store PK, timestamp, we should store Good part:
@chyezh @zhagnlu @congqixia |
I think that: Why not generate the mask when insert/delete operation incoming. |
this solution will generate too much (ts,mask) pair, mask is cost memory object |
@chyezh @zhagnlu @xiaofan-luan For growing segments, the
And another tricky scenario is for the incoming iterator:(for aggressive pre generation and DISCARD)
IMHO, there are some possible way to optimize as fellows have suggested:
for opt 1, sealed segment is safe to implement. growing segment is doable with opt3 ts |
I think opt3 is good choice for both sealed and growing, (mask, ts) can be compacted fast. because our search operations are increasing the ts fast. |
Yes, opt3 is necessary for (offset, ts) solution. |
But anyway, original solution now also has a cache solution that only process added (pk,ts) pairs, this issue mainly meet a case:
if the delta log is not big, or delete log apply with many times not once, this cache cost will shrink to many times not spikes. for new (offset, ts) solution, the advantage is that it push some cost to load and apply delete record, at the same time, we need to implement a cache like opt3 upper. |
the problem is about timetravel and mvcc. |
I think for now we will just go for To optimize, one thing we can do is to generate a snapshot bitset for every 10000 deletes, but again this is not the key challenege. With the offset ts optimization we should already be able to optimize the performance to 10X faster. If this is still not fast enough, then we do ts -> bitset If we want check the delete bitset at t10, |
I don't thinks is a problem. |
#37413 Signed-off-by: luzhang <luzhang@zilliz.com> Co-authored-by: luzhang <luzhang@zilliz.com>
Is there an existing issue for this?
Environment
Current Behavior
server config
test steps
results
metrics of compact-opt-100m-2
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: