Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Querynode failed to update target due to the segment to load not exists in targets #30469

Closed
1 task done
ThreadDao opened this issue Feb 2, 2024 · 2 comments
Closed
1 task done
Assignees
Labels
2.4-features kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

ThreadDao commented Feb 2, 2024

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: master-20240201-e22e8b30-amd64
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.4.0rc24
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. deploy standalone with kafka mq, and milvus config is:
log:
          level: debug
        trace:
          exporter: jaeger
          sampleFraction: 1
          jaeger:
            url: http://tempo-distributor.tempo:14268/api/traces
  1. test case: delta-stable-stand-flush-5-3790669139:
    a. create collection with 2 shards -> build hnsw index -> insert 3m-128d -> flush -> index again -> load
    b. concurrent requests: insert + delete + flush + search
    'concurrent_params': {'concurrent_number': 50,
                          'during_time': '10h',
                          'interval': 60,
                          'spawn_rate': None},
    'concurrent_tasks': [{'type': 'search',
                          'weight': 3,
                          'params': {'nq': 100,
                                     'top_k': 100,
                                     'search_param': {'ef': 128},
                                     'timeout': 120}},
                         {'type': 'insert',
                          'weight': 3,
                          'params': {'nb': 100,
                                     'start_id': 3000000,
                                     'random_id': True,
                                     'random_vector': True,
                                     'timeout': 120}},
                         {'type': 'delete',
                          'weight': 3,
                          'params': {'delete_length': 50,
                                     'timeout': 120}},
                         {'type': 'flush',
                          'weight': 1,
                          'params': {'timeout': 120}}]},
    
  2. Combining metrics querynode loaded segment num and datacoord segment num, there seems to be a problem with the target update before 20:00. The compaction of dn is executed normally, but the number of growing segments in qn a keeps rising.

image

image

  1. Loki logs show some warning msg
    image

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

4am qa-milvus pods:

delta-insert-stand-5-milvus-standalone-875577687-jqlg2     Running     0            1m      10.104.16.150     4am-node21     
delta-insert-stand-5-etcd-0                                Running     0            3m      10.104.28.45      4am-node33     
delta-insert-stand-5-kafka-0                               Running     1            3m      10.104.29.200     4am-node35     
delta-insert-stand-5-kafka-1                               Running     0            3m      10.104.25.156     4am-node30     
delta-insert-stand-5-kafka-2                               Running     0            3m      10.104.28.50      4am-node33     
delta-insert-stand-5-kafka-zookeeper-0                     Running     0            3m      10.104.25.155     4am-node30     
delta-insert-stand-5-kafka-zookeeper-1                     Running     0            3m      10.104.16.147     4am-node21     
delta-insert-stand-5-kafka-zookeeper-2                     Running     0            3m      10.104.23.92      4am-node27     
delta-insert-stand-5-minio-7df576d4c5-8kn88                Running     0            3m      10.104.28.46      4am-node33 

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 2, 2024
@ThreadDao ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Feb 2, 2024
@ThreadDao ThreadDao added this to the 2.4.0 milestone Feb 2, 2024
@ThreadDao ThreadDao assigned aoiasd and unassigned yah01, aoiasd and yanliang567 Feb 2, 2024
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. 2.4-features and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Feb 3, 2024
congqixia added a commit to congqixia/milvus that referenced this issue Feb 4, 2024
See also milvus-io#30469

For a sync task, the segment could be compacted during sync task. In
previous implementation, this sync task will hold only the old segment
id as KeyLock, in which case compaction on compacted to segment may run
in parallel with delta sync of this sync task.

This PR introduces sync target segment verification logic. It shall
check target segment lock it's holding beforing actually syncing logic.
If this check failed, sync task shall return`errTargetSegementNotMatch`
error and make manager re-fetch the current target segment id.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sre-ci-robot pushed a commit that referenced this issue Feb 5, 2024
See also #27675 #30469

For a sync task, the segment could be compacted during sync task. In
previous implementation, this sync task will hold only the old segment
id as KeyLock, in which case compaction on compacted to segment may run
in parallel with delta sync of this sync task.

This PR introduces sync target segment verification logic. It shall
check target segment lock it's holding beforing actually syncing logic.
If this check failed, sync task shall return`errTargetSegementNotMatch`
error and make manager re-fetch the current target segment id.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
@ThreadDao ThreadDao changed the title [Bug]: Querynode failed to upfate target due to the segment to load not exists in targets [Bug]: Querynode failed to update target due to the segment to load not exists in targets Feb 20, 2024
@ThreadDao
Copy link
Contributor Author

@aoiasd @XuanYang-cn
Same problem: Deltalog Key Not Found. Please link here for related comments and pr.

@ThreadDao
Copy link
Contributor Author

fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.4-features kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

4 participants