-
-
Notifications
You must be signed in to change notification settings - Fork 155
Description
Currently dask worker pods are spread onto available nodes by the default kubernetes scheduler:
[ec2-user@ip-192-168-60-131 ~]$ kubectl get pod -o yaml dask-cgentemann-osm2020tutorial-nqchvhmy-6e9099fc-3k2s6c -n binder-staging | grep schedule
schedulerName: default-scheduler
This can lead to scale-down issues with multiple users launching clusters or when pods encounter errors because pods by default spread out on available nodes. For example, we recently observed an issue were many dask pods had an Error status, leading to new nodes being launched to meet capacity. We ended up with 17 nodes running with two dask pods per node instead of packing all pods onto 5 nodes.
JupyterHub deals with this same scenario by packing user-notebook pods onto nodes with a custom userScheduler
:
https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#using-available-nodes-efficiently-the-user-scheduler
@yuvipanda suggested a possible solution is simply reusing the jupyter scheduler in dask kubernetes config. Some additional relevant docs here:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/#specify-schedulers-for-pods