Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When building an index for the same data, the CPU usage of the master image is 100%, but that of the 2.4 image is only 60% #39090

Open
1 task done
ThreadDao opened this issue Jan 8, 2025 · 4 comments
Assignees
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Milestone

Comments

@ThreadDao
Copy link
Contributor

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.4-20250106-9e221063-amd64 vs master-20250108-f0dae814-amd64
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Test index cost for sift 10m data on master and 2.4 branch. The cpu usage of master is 100% while the 2.4 is only 60%

  1. standalone config
    standalone:
      env:
      - name: GOTRACEBACK
        value: crash
      replicas: 1
      resources:
        limits:
          cpu: "8" 
          memory: 32Gi
        requests:
          cpu: "8" 
          memory: 30Gi
  1. index params:
{'index_type': 'HNSW', 'metric_type': 'L2', 'params': {'M': 30, 'efConstruction': 360}}
  1. index cost and cpu usage

pyroscope of stats-24-op-55-4263
image

pyroscope of stats-master-op-62-8296
image

Expected Behavior

No response

Steps To Reproduce

https://argo-workflows.zilliz.cc/archived-workflows/qa/8a74abd1-0140-4ce5-af21-f146e2275f1f?nodeId=zong-stats-index-4

Milvus Log

  • pods of 2.4:
stats-24-op-55-4263-etcd-0                                        1/1     Running                  0               5h20m   10.104.25.5     4am-node30   <none>           <none>
stats-24-op-55-4263-kafka-0                                       1/1     Running                  1 (5h19m ago)   5h19m   10.104.30.233   4am-node38   <none>           <none>
stats-24-op-55-4263-kafka-1                                       1/1     Running                  1 (5h19m ago)   5h19m   10.104.32.33    4am-node39   <none>           <none>
stats-24-op-55-4263-kafka-2                                       1/1     Running                  0               5h19m   10.104.19.161   4am-node28   <none>           <none>
stats-24-op-55-4263-kafka-zookeeper-0                             1/1     Running                  0               5h19m   10.104.32.31    4am-node39   <none>           <none>
stats-24-op-55-4263-kafka-zookeeper-1                             1/1     Running                  0               5h19m   10.104.17.211   4am-node23   <none>           <none>
stats-24-op-55-4263-kafka-zookeeper-2                             1/1     Running                  0               5h19m   10.104.19.160   4am-node28   <none>           <none>
stats-24-op-55-4263-milvus-standalone-b95fd7954-x65ss             1/1     Running                  0               5h18m   10.104.9.159    4am-node14   <none>           <none>
stats-24-op-55-4263-minio-64c5f5f586-zdqkb                        1/1     Running                  0               5h19m   10.104.25.7     4am-node30   <none>           <none>
  • pods of master
stats-master-op-62-8296-etcd-0                                    1/1     Running                  0              5h6m    10.104.25.3     4am-node30   <none>           <none>
stats-master-op-62-8296-kafka-0                                   1/1     Running                  1 (5h6m ago)   5h6m    10.104.19.156   4am-node28   <none>           <none>
stats-master-op-62-8296-kafka-1                                   1/1     Running                  1 (5h6m ago)   5h6m    10.104.25.4     4am-node30   <none>           <none>
stats-master-op-62-8296-kafka-2                                   1/1     Running                  1 (5h6m ago)   5h6m    10.104.30.231   4am-node38   <none>           <none>
stats-master-op-62-8296-kafka-zookeeper-0                         1/1     Running                  0              5h6m    10.104.19.157   4am-node28   <none>           <none>
stats-master-op-62-8296-kafka-zookeeper-1                         1/1     Running                  0              5h6m    10.104.32.28    4am-node39   <none>           <none>
stats-master-op-62-8296-kafka-zookeeper-2                         1/1     Running                  0              5h6m    10.104.20.211   4am-node22   <none>           <none>
stats-master-op-62-8296-milvus-standalone-77c5ddc7d6-gzj5t        1/1     Running                  0              5h5m    10.104.6.4      4am-node13   <none>           <none>
stats-master-op-62-8296-minio-7fdf8fbc77-r4rnz                    1/1     Running                  0              5h6m    10.104.32.27    4am-node39   <none>           <none>
stats-master-op-62-8296-minio-update-prometheus-secret-zzh49      0/1     Completed                0              5h6m    10.104.6.3      4am-node13   <none>           <none>

Anything else?

No response

@ThreadDao ThreadDao added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 8, 2025
@ThreadDao ThreadDao added the severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. label Jan 8, 2025
@ThreadDao ThreadDao added this to the 2.4.21 milestone Jan 8, 2025
@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 8, 2025
@yanliang567
Copy link
Contributor

/unassign

@xiaocai2333
Copy link
Contributor

xiaocai2333 commented Jan 10, 2025

@ThreadDao In standalone mode, the default CPU usage is limited to 75%, so the usage of 2.4 is expected. However, I'm still investigating why the master can utilize 100% of the CPU.

@yanliang567
Copy link
Contributor

@ThreadDao In standalone mode, the default CPU usage is limited to 75%, so the usage of 2.4 is expected. However, I'm still investigating why the master can utilize 100% of the CPU.

is there any config for this cpu usage limitation? I remember it was 50%? @xiaocai2333

@xiaocai2333
Copy link
Contributor

@ThreadDao In standalone mode, the default CPU usage is limited to 75%, so the usage of 2.4 is expected. However, I'm still investigating why the master can utilize 100% of the CPU.

is there any config for this cpu usage limitation? I remember it was 50%? @xiaocai2333

buildIndexThreadPoolRatio: 0.75

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Issues or changes related a bug severity/critical Critical, lead to crash, data missing, wrong result, function totally doesn't work. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants