-
-
Notifications
You must be signed in to change notification settings - Fork 155
Closed as not planned
Description
What happened:
Cluster failed to create when using a worker template that contains multiple containers (e.g. sidecar pattern).
The error:
kubernetes_asyncio.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: <CIMultiDictProxy('Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Wed, 15 Dec 2021 20:09:31 GMT', 'Content-Length': '212')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"a container name must be specified for pod dask-root-46c95e02-0vgngv, choose one of: [dask model]","reason":"BadRequest","code":400}
I traced this back to the logs() function in the Pod class in core.py. It makes a call to read_namespaced_pod_log (which in the case of multiple containers in the pod, needs a "container=" argument passed to it with the name of the container.
What you expected to happen:
The cluster to be created correctly. I expected the logs() function to be smart enough to know which container is the dask container or iterate through each container until it recognized the logs it was looking for.
Minimal Complete Verifiable Example:
cluster = KubeCluster.from_yaml(spec_path)
cluster.scale(4)
And the worker-spec.yaml file:
kind: Pod
metadata:
labels:
foo: bar
spec:
restartPolicy: Never
containers:
- image: dev.local/batch-app:0.0.4
imagePullPolicy: IfNotPresent
args: [/app/venv/bin/dask-worker, --nthreads, '2', --no-dashboard, --memory-limit, 2GB, --death-timeout, '60']
name: dask
resources:
limits:
cpu: "1"
memory: 2G
requests:
cpu: "1"
memory: 2G
- name: model
image: dev.local/ml-model-prototype:0.0.1
env:
- name: SERVER_HOST
value: "0.0.0.0"
- name: SERVER_PORT
value: "8989"
Anything else we need to know?:
Environment:
- Dask version: 2021.12.0
- Python version: 3.9.5
- Operating System: Linux
- Install method (conda, pip, source): pip