dask-kubernetes creates a Dask cluster on Google Container Engine.
It uses Google Cloud Storage bucket to store your notebook for persistence so there is no need to use a persistent volume.
- Create a GCS bucket for storing your notebooks
- Change
c.GoogleStorageContentManager.default_pathinjupyter-config.pyto your GCS path - Create a GKE cluster of your choice (Recommend 2CPU 7.5G or larger each node), make sure turn on legacy authorisation mode
kubectl apply -f ./kube/- Connect to service using port forwarding
kubectl port-forward svc/svc-notebooks 8888:8888, or use the public ip fromkubectl get svc - Start using cluster!
from dask_kubernetes import KubeCluster # See a sample worker spec in `config/worker-spec-sample.yaml` cluster = KubeCluster.from_yaml('...your yaml path') cluster.scale(3) # the desired number of nodes from dask.distributed import Client client = Client(cluster)
- Change the
Dockerfile, build your image, and push it to any of the image storage service. - Change the image name in
30-deployment.yamlfile - Apply your kubernetes configuration