Skip to content

KubeCluster with service type LoadBalancer on AKS stuck in "Pending" state on startup #906

@catzc

Description

@catzc

Describe the issue:
I'm trying to create a KubeCluster with AKS LoadBalancer, but the DaskCluster is stuck in a pending state at startup.

Minimal Complete Verifiable Example:

Trying to create a cluster like this:

def make_sim_cluster(
        name: str,
        cpu_cores: int,
        memory_in_gb: int, 
        image: str = '<image>',
        tag: str = 'x.x.x',
        namespace: str = 'namespace',
):
    base_cluster_spec = make_cluster_spec(
        name=name,
        image=f'{image}:{tag}', 
        n_workers=cpu_cores,
        scheduler_service_type="LoadBalancer",
    )
    base_cluster_spec['metadata']['annotations'] = {
        "service.beta.kubernetes.io/azure-load-balancer-internal": "true",
    }

    return KubeCluster(custom_cluster_spec=base_cluster_spec, namespace=namespace)

but the DaskCluster is stuck in Pending State until it times out
image
image

Anything else we need to know?:

I looked around at our service spec and it seems like our resource definition has our load balancer under a key "loadBalancer" when the scheduler service comes up (as referenced in kubernetes doc https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer)
image
while dask-kubernetes seems to look around for "load_balancer" in status:

if spec["type"] == "LoadBalancer" and not len(
status.get("load_balancer", {}).get("ingress", [])
):
phase = "Pending"

for what it's worth, the kr8s library also seems to look around for "load_balancer".

When I switched these over to using "loadBalancer" the DaskCluster was able to come up without any issues. Actually even when I didn't change the code, the service, pods, deployments etc were all able to be setup and in a running state, only the DaskCluster was pending.

Any helps is great, thanks!

Environment:

  • Dask version:
    • dask-core: 2024.2.1
    • dask-kubernetes: 2024.5.0
  • Python version: 3.11.9
  • Operating System: RHEL 7.9 (Maipo)
  • Install method (conda, pip, source): conda

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions