-
-
Notifications
You must be signed in to change notification settings - Fork 155
Description
Describe the issue:
I'm trying to create a KubeCluster with AKS LoadBalancer, but the DaskCluster is stuck in a pending state at startup.
Minimal Complete Verifiable Example:
Trying to create a cluster like this:
def make_sim_cluster(
name: str,
cpu_cores: int,
memory_in_gb: int,
image: str = '<image>',
tag: str = 'x.x.x',
namespace: str = 'namespace',
):
base_cluster_spec = make_cluster_spec(
name=name,
image=f'{image}:{tag}',
n_workers=cpu_cores,
scheduler_service_type="LoadBalancer",
)
base_cluster_spec['metadata']['annotations'] = {
"service.beta.kubernetes.io/azure-load-balancer-internal": "true",
}
return KubeCluster(custom_cluster_spec=base_cluster_spec, namespace=namespace)
but the DaskCluster is stuck in Pending State until it times out
Anything else we need to know?:
I looked around at our service spec and it seems like our resource definition has our load balancer under a key "loadBalancer" when the scheduler service comes up (as referenced in kubernetes doc https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer)
while dask-kubernetes seems to look around for "load_balancer" in status:
dask-kubernetes/dask_kubernetes/operator/controller/controller.py
Lines 403 to 406 in 2ecfdcd
if spec["type"] == "LoadBalancer" and not len( | |
status.get("load_balancer", {}).get("ingress", []) | |
): | |
phase = "Pending" |
for what it's worth, the kr8s library also seems to look around for "load_balancer".
When I switched these over to using "loadBalancer" the DaskCluster was able to come up without any issues. Actually even when I didn't change the code, the service, pods, deployments etc were all able to be setup and in a running state, only the DaskCluster was pending.
Any helps is great, thanks!
Environment:
- Dask version:
- dask-core: 2024.2.1
- dask-kubernetes: 2024.5.0
- Python version: 3.11.9
- Operating System: RHEL 7.9 (Maipo)
- Install method (conda, pip, source): conda