We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I setup a GPU pool, and autoscaler works fine scaling up from 1 to n nodes, but not from 0 to n nodes. The error message is:
I0605 11:27:29.865576 1 scale_up.go:54] Pod default/simple-gpu-test-6f48d9555d-l9822 is unschedulable I0605 11:27:29.961051 1 scale_up.go:86] Upcoming 0 nodes I0605 11:27:30.005163 1 scale_up.go:146] Scale-up predicate failed: PodFitsResources predicate mismatch, cannot put default/simple-gpu-test-6f48d9555d-l9822 on template-node-for-gpus.ci.k8s.local-5829202798403814789, reason: Insufficient nvidia.com/gpu I0605 11:27:30.005262 1 scale_up.go:175] No pod can fit to gpus.ci.k8s.local I0605 11:27:30.005324 1 scale_up.go:180] No expansion options I0605 11:27:30.005393 1 static_autoscaler.go:299] Calculating unneeded nodes I0605 11:27:30.008919 1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"simple-gpu-test-6f48d9555d-l9822", UID:"3416d787-68b3-11e8-8e8f-0639a6e973b0", APIVersion:"v1", ResourceVersion:"12429157", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added) I0605 11:27:30.031707 1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler
This is on Kubernetes 1.9.6 with autoscaler 1.1.2.
The nodes carry the label kops.k8s.io/instancegroup=gpus, which is also present in the autoscaler group on AWS:
kops.k8s.io/instancegroup=gpus
{ "ResourceType": "auto-scaling-group", "ResourceId": "gpus.ci.k8s.local", "PropagateAtLaunch": true, "Value": "gpus", "Key": "k8s.io/cluster-autoscaler/node-template/label/kops.k8s.io/instancegroup" },
If I start a node, I see it has the required capacity:
Capacity: cpu: 4 memory: 62884036Ki nvidia.com/gpu: 1 pods: 110
This is the simple deployment I use to test it:
apiVersion: extensions/v1beta1 kind: Deployment metadata: name: simple-gpu-test spec: replicas: 1 template: metadata: labels: app: "simplegputest" spec: containers: - name: "nvidia-smi-gpu" image: "nvidia/cuda:8.0-cudnn5-runtime" resources: limits: nvidia.com/gpu: 1 # requesting 1 GPU volumeMounts: - mountPath: /usr/local/nvidia name: nvidia command: [ "/bin/bash", "-c", "--" ] args: [ "while true; do nvidia-smi; sleep 5; done;" ] volumes: - hostPath: path: /usr/local/nvidia name: nvidia
Related to #321 where I reported it earlier
The text was updated successfully, but these errors were encountered:
Sorry this is a duplicate of #903
Sorry, something went wrong.
No branches or pull requests
I setup a GPU pool, and autoscaler works fine scaling up from 1 to n nodes, but not from 0 to n nodes. The error message is:
This is on Kubernetes 1.9.6 with autoscaler 1.1.2.
The nodes carry the label
kops.k8s.io/instancegroup=gpus
, which is also present in the autoscaler group on AWS:If I start a node, I see it has the required capacity:
This is the simple deployment I use to test it:
Related to #321 where I reported it earlier
The text was updated successfully, but these errors were encountered: