Skip to content

[Bug]: In stability testing, the latency of query/search is unstable and has increased by many times at certain moments. #37527

Open
@zhuwenxing

Description

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:master-20241107-f813fb45-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

image

When the delay in query/search increased, a lot of rate limits on insert/delete operations appeared.

image
At the point of increased latency, the CPU and memory usage of the query node are relatively stable.

Expected Behavior

The latency of search/query can be relatively stable.

Steps To Reproduce

No response

Milvus Log

cluster: 4am
ns: chaos-testing
pod info

❯ k get pod|grep fts-stable-test-11                                  
fts-stable-test-11-etcd-0                                         0/1     Pending                  0                 9h
fts-stable-test-11-etcd-1                                         1/1     Running                  0                 15h
fts-stable-test-11-etcd-2                                         1/1     Running                  0                 15h
fts-stable-test-11-kafka-0                                        0/2     Pending                  0                 9h
fts-stable-test-11-kafka-1                                        2/2     Running                  1 (15h ago)       15h
fts-stable-test-11-kafka-2                                        2/2     Running                  0                 15h
fts-stable-test-11-kafka-exporter-5d659487fb-v4fsz                1/1     Running                  4 (15h ago)       15h
fts-stable-test-11-milvus-datanode-6bd55bd8bd-2nggb               1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-datanode-6bd55bd8bd-5d2t6               1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-indexnode-964df8d55-4fmfv               1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-indexnode-964df8d55-kqgmm               1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-mixcoord-68b694d6df-xp6cl               1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-proxy-8458cfb765-5qbfv                  1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-querynode-56c6787d78-czx9t              1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-milvus-querynode-56c6787d78-d9zwd              1/1     Running                  3 (15h ago)       15h
fts-stable-test-11-milvus-querynode-56c6787d78-wl5jx              1/1     Running                  2 (15h ago)       15h
fts-stable-test-11-minio-0                                        1/1     Running                  0                 15h
fts-stable-test-11-minio-1                                        1/1     Running                  0                 15h
fts-stable-test-11-minio-2                                        0/1     Pending                  0                 9h
fts-stable-test-11-minio-3                                        1/1     Running                  0                 15h
fts-stable-test-11-zookeeper-0                                    1/1     Running                  0                 15h
fts-stable-test-11-zookeeper-1                                    1/1     Running                  0                 15h
fts-stable-test-11-zookeeper-2                                    0/1     Pending                  0                 9h

grafana link

Anything else?

Will the rate limit of insert/delete operations affect the latency of search/query? If so, it would contradict the design principle of Milvus's read-write separation.

Metadata

Labels

kind/bugIssues or changes related a bugseverity/criticalCritical, lead to crash, data missing, wrong result, function totally doesn't work.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions