Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaling up from 0 nodes on AWS, CA not aware of custom resources #321

Closed
7chenko opened this issue Sep 9, 2017 · 17 comments · Fixed by #322
Closed

Scaling up from 0 nodes on AWS, CA not aware of custom resources #321

7chenko opened this issue Sep 9, 2017 · 17 comments · Fixed by #322
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider

Comments

@7chenko
Copy link

7chenko commented Sep 9, 2017

When scaling up from 0 nodes on AWS, how can I make cluster-autoscaler aware of custom resources on the nodes, such as "alpha.kubernetes.io/nvidia-gpu"?

Using kops 1.7.0, kubernetes 1.7.5, cluster-autoscaler 0.6.1, when I have 0 nodes running, starting a job with "resources: limits: alpha.kubernetes.io/nvidia-gpu: 1" results in CA inaction due to (note "Insufficient alpha.kubernetes.io/nvidia-gpu"):

I0909 03:36:26.255878       1 scale_up.go:50] Pod default/0dd3d1fc-3e73-90e7-84e0-f24009dc3784-08t94 is unschedulable
I0909 03:36:26.351319       1 scale_up.go:71] Upcoming 0 nodes
I0909 03:36:26.385779       1 scale_up.go:112] Scale-up predicate failed: GeneralPredicates predicate mismatch, cannot put default/0dd3d1fc-3e73-90e7-84e0-f24009dc3784-08t94 on template-node-for-nodes.uswest2.metamoto.net-7573953041833664557, reason: Insufficient alpha.kubernetes.io/nvidia-gpu
I0909 03:36:26.385809       1 scale_up.go:141] No pod can fit to %snodes.uswest2.metamoto.net
I0909 03:36:26.385819       1 scale_up.go:146] No expansion options
...
I0909 02:55:41.381017       1 event.go:218] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"2f0b2cb6-782b-f2e6-4c3c-c37f648f45b2-6fhl2", UID:"21a5fb02-94fe-11e7-beff-06c6424932c2", APIVersion:"v1", ResourceVersion:"7436", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)

It looks like the "template-node-for-nodes" doesn't have the resources listed. However if I start a job without the gpu requirement, a node is spun up, and then I can start the original gpu job and it gets scheduled on the node! The node looks like this (kubectl describe nodes) (note "alpha.kubernetes.io/nvidia-gpu: 1"):

Name:			ip-172-31-121-22.us-west-2.compute.internal
Role:
Labels:			beta.kubernetes.io/arch=amd64
			beta.kubernetes.io/instance-type=g2.2xlarge
			beta.kubernetes.io/os=linux
			failure-domain.beta.kubernetes.io/region=us-west-2
			failure-domain.beta.kubernetes.io/zone=us-west-2a
			kubernetes.io/hostname=ip-172-31-121-22.us-west-2.compute.internal
			kubernetes.io/role=node
			node-role.kubernetes.io/node=
Annotations:		node.alpha.kubernetes.io/ttl=0
			volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:			<none>
CreationTimestamp:	Fri, 08 Sep 2017 19:57:32 -0700
Conditions:
  Type			Status	LastHeartbeatTime			LastTransitionTime			Reason				Message
  ----			------	-----------------			------------------			------				-------
  NetworkUnavailable 	False 	Fri, 08 Sep 2017 19:57:37 -0700 	Fri, 08 Sep 2017 19:57:37 -0700 	RouteCreated 			RouteController created a route
  OutOfDisk 		False 	Fri, 08 Sep 2017 20:06:43 -0700 	Fri, 08 Sep 2017 19:57:32 -0700 	KubeletHasSufficientDisk 	kubelet has sufficient disk space available
  MemoryPressure 	False 	Fri, 08 Sep 2017 20:06:43 -0700 	Fri, 08 Sep 2017 19:57:32 -0700 	KubeletHasSufficientMemory 	kubelet has sufficient memory available
  DiskPressure 		False 	Fri, 08 Sep 2017 20:06:43 -0700 	Fri, 08 Sep 2017 19:57:32 -0700 	KubeletHasNoDiskPressure 	kubelet has no disk pressure
  Ready 		True 	Fri, 08 Sep 2017 20:06:43 -0700 	Fri, 08 Sep 2017 19:57:52 -0700 	KubeletReady 			kubelet is posting ready status. AppArmor enabled
Addresses:
  InternalIP:	172.31.121.22
  ExternalIP:	34.213.162.221
  InternalDNS:	ip-172-31-121-22.us-west-2.compute.internal
  ExternalDNS:	ec2-34-213-162-221.us-west-2.compute.amazonaws.com
  Hostname:	ip-172-31-121-22.us-west-2.compute.internal
Capacity:
 alpha.kubernetes.io/nvidia-gpu:	1
 cpu:					8
 memory:				15399064Ki
 pods:					110
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:	1
 cpu:					8
 memory:				15296664Ki
 pods:					110
System Info:
 Machine ID:			2118324e509d4582ae925c3ed83d8f2a
 System UUID:			EC2DF760-2914-FF5B-B89E-6B85AEF7C8C2
 Boot ID:			984a55a1-ca22-45cb-9c17-39f71e8315cb
 Kernel Version:		4.4.0-1017-aws
 OS Image:			Ubuntu 16.04.2 LTS
 Operating System:		linux
 Architecture:			amd64
 Container Runtime Version:	docker://1.12.6
 Kubelet Version:		v1.7.5
 Kube-Proxy Version:		v1.7.5
PodCIDR:			100.96.3.0/24
ExternalID:			i-04de825c788bf994e
Non-terminated Pods:		(2 in total)
  Namespace			Name								CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ---------			----								------------	----------	---------------	-------------
  default			93caffb3-2868-a526-8810-2ddc0dd1140a-fv5t4			100m (1%)	0 (0%)		0 (0%)		0 (0%)
  kube-system			kube-proxy-ip-172-31-121-22.us-west-2.compute.internal		100m (1%)	0 (0%)		0 (0%)		0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests	CPU Limits	Memory Requests	Memory Limits
  ------------	----------	---------------	-------------
  200m (2%)	0 (0%)		0 (0%)		0 (0%)

New nodes are also spun up correctly as long as there is already at least 1 node running. Any idea how to make the "template" for nodes list the correct resources? Thanks!

@7chenko
Copy link
Author

7chenko commented Sep 9, 2017

Is the only way to do this to use labels and nodeSelectors?

@mwielgus mwielgus added area/provider/aws Issues or PRs related to aws provider area/cluster-autoscaler labels Sep 10, 2017
@sethpollack
Copy link
Contributor

Any idea how to pull that info from the ASG?

@sethpollack
Copy link
Contributor

@sethpollack
Copy link
Contributor

I can push up a fix soon @7chenko would you be able to test it?

@7chenko
Copy link
Author

7chenko commented Sep 10, 2017

Yup, I will test!

@sethpollack
Copy link
Contributor

Thanks!

@7chenko
Copy link
Author

7chenko commented Sep 12, 2017

Confirmed this works, scaling up from 0 triggered. Thanks!

@7chenko
Copy link
Author

7chenko commented Sep 12, 2017

Weirdly, this works when the nodes are g2.2xlarge instances, but not when they are p2.xlarge instances. Same error as before:

scale_up.go:141] Scale-up predicate failed: PodFitsResources predicate mismatch, cannot put default/ba6e66e1-d5c0-31b7-7af1-a9d18401a823-wx6w5 on template-node-for-nodes.uswest2.metamoto.net-7466275748196809522, reason: Insufficient alpha.kubernetes.io/nvidia-gpu

What could cause this difference in behavior?

@sethpollack
Copy link
Contributor

@sethpollack
Copy link
Contributor

Ok, so it is parsing correctly, AWS just isn't providing the gpu data for the p2.xlarges.

https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/us-west-2/index.json

@7chenko
Copy link
Author

7chenko commented Sep 13, 2017

Gotcha, thanks for that. Will contact AWS with that info.

@sethpollack
Copy link
Contributor

Ok thanks

@7chenko
Copy link
Author

7chenko commented Oct 28, 2017

Confirmed that AWS has now fixed the gpu data for p2.xxx:

https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/AmazonEC2/current/us-west-2/index.json

@sethpollack
Copy link
Contributor

Thanks! I'll push an update.

@alexnederlof
Copy link

Hmm doesn't work for me. I get Insufficient nvidia.com/gpu. Does someone see what I'm doing wrong here? This is on Kubernetes v1.9.6 with autoscaler 1.1.2. Could it be that nvidia.com/gpu is not registered correctly in that node template?

I have two instance groups, one with cpus, and a new one I want to scale down to 0 nodes called gpus, so kops edit ig gpus is:

apiVersion: kops/v1alpha2
kind: InstanceGroup
metadata:
  creationTimestamp: 2018-05-31T09:27:31Z
  labels:
    kops.k8s.io/cluster: ci.k8s.local
  name: gpus
spec:
  cloudLabels:
    instancegroup: gpus
    k8s.io/cluster-autoscaler/enabled: ""
    k8s.io/cluster-autoscaler/node-template/label: ""
  image: ami-4450543d
  kubelet:
    featureGates:
      DevicePlugins: "true"
  machineType: p2.xlarge
  maxPrice: "0.5"
  maxSize: 3
  minSize: 0
  nodeLabels:
    kops.k8s.io/instancegroup: gpus
    spot: "true"
  role: Node
  rootVolumeOptimization: true
  subnets:
  - eu-west-1c

And the autoscaler deployment has:

    spec:
      containers:
      - command:
        - ./cluster-autoscaler
        - --v=4
        - --stderrthreshold=info
        - --cloud-provider=aws
        - --skip-nodes-with-local-storage=false
        - --nodes=0:3:gpus.ci.k8s.local
        env:
        - name: AWS_REGION
          value: eu-west-1
        image: k8s.gcr.io/cluster-autoscaler:v1.1.2

Now I try to deploy a simple GPU test:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: simple-gpu-test
spec: 
  replicas: 1
  template:
    metadata:
      labels:
        app: "simplegputest"
    spec:
      containers: 
      - name: "nvidia-smi-gpu"
        image: "nvidia/cuda:8.0-cudnn5-runtime"
        resources: 
          limits: 
             nvidia.com/gpu: 1 # requesting 1 GPU
        volumeMounts:
        - mountPath: /usr/local/nvidia
          name: nvidia
        command: [ "/bin/bash", "-c", "--" ]
        args: [ "while true; do nvidia-smi; sleep 5; done;" ]
      volumes:
      - hostPath:
          path: /usr/local/nvidia
        name: nvidia

I expect the instance group to go from 0 to 1, but the autoscaler logs show:

I0605 11:27:29.865576       1 scale_up.go:54] Pod default/simple-gpu-test-6f48d9555d-l9822 is unschedulable
I0605 11:27:29.961051       1 scale_up.go:86] Upcoming 0 nodes
I0605 11:27:30.005163       1 scale_up.go:146] Scale-up predicate failed: PodFitsResources predicate mismatch, cannot put default/simple-gpu-test-6f48d9555d-l9822 on template-node-for-gpus.ci.k8s.local-5829202798403814789, reason: Insufficient nvidia.com/gpu
I0605 11:27:30.005262       1 scale_up.go:175] No pod can fit to gpus.ci.k8s.local
I0605 11:27:30.005324       1 scale_up.go:180] No expansion options
I0605 11:27:30.005393       1 static_autoscaler.go:299] Calculating unneeded nodes
I0605 11:27:30.008919       1 factory.go:33] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"simple-gpu-test-6f48d9555d-l9822", UID:"3416d787-68b3-11e8-8e8f-0639a6e973b0", APIVersion:"v1", ResourceVersion:"12429157", FieldPath:""}): type: 'Normal' reason: 'NotTriggerScaleUp' pod didn't trigger scale-up (it wouldn't fit if a new node is added)
I0605 11:27:30.031707       1 leaderelection.go:199] successfully renewed lease kube-system/cluster-autoscaler

When I start a node by setting the minimum tot 1, I see that it has the capacity:

Capacity:
 cpu:             4
 memory:          62884036Ki
 nvidia.com/gpu:  1
 pods:            110

and labels
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=p2.xlarge
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=eu-west-1
                    failure-domain.beta.kubernetes.io/zone=eu-west-1c
                    kops.k8s.io/instancegroup=gpus
                    kubernetes.io/hostname=ip-172-20-51-219.eu-west-1.compute.internal
                    kubernetes.io/role=node
                    node-role.kubernetes.io/node=
                    spot=true

Finally, when I set the min pool size to 1, it can scale from 1 to 3 automatically. Just doesn't doe 0 to 1.

@7chenko
Copy link
Author

7chenko commented Jul 16, 2018

This has broken for me from CA 1.1.0 to 1.2.2. Same configuration now fails to scale up from 0 nodes with "Insufficient nvidia.com/gpu". Reverting back to 1.1.0 fixes it. (Kubernetes 1.10.0).

yaroslava-serdiuk pushed a commit to yaroslava-serdiuk/autoscaler that referenced this issue Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler area/provider/aws Issues or PRs related to aws provider
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants