- kubectl version v1.11.3+.
- buildah version v1.33.10+
- Access to a Kubernetes v1.11.3+ cluster.
Slurm on Kubernetes provides the following features:
- Resource Management: Efficiently manages resources in a Kubernetes cluster, ensuring optimal utilization.
- Job Scheduling: Advanced job scheduling capabilities to handle various types of workloads.
- Scalability: Easily scales to accommodate growing workloads and resources.
- High Availability: Supports high availability configurations to ensure continuous operation.
- Multi-User Support: Allows multiple users to submit and manage their jobs concurrently.
- Integration with MPI Libraries: Supports both Open MPI and Intel MPI libraries for parallel computing.
- Customizable: Using values.yaml file, you can customizable a slurm cluster, fitting specific needs and configurations.
- Separated munged daemon
- Support GPU nodes deployment
- Running on Cgroup v1/v2
if you wanna change slurm configuration ,please check slurm configuration generator, check link
- for github helm user
- get helm repo and update
helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts - install slurm chart
helm install slurm ay-helm-mirror/chart -f charts/values.yaml --version 1.0.10
- get helm repo and update
- for artifact helm user
- get helm repo and update
helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts - install slurm chart
helm install slurm ay-helm-mirror/chart -f charts/values.yaml --version 1.0.10
- get helm repo and update
- for opertaor user
- test pull an image and apply
podman pull ghcr.io/aaronyang0628/slurm-operator:25.05 - deploy slurm operator
kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/slurm-on-k8s/refs/heads/main/operator/dist/install.yaml - apply CRD slurmdeployment
kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/helm-chart-mirror/refs/heads/main/templates/slurm/slurmdeployment.zj.values.yaml
- test pull an image and apply
- check cluster status
kubectl get slurmdep slurmdeployment-sample kubectl -n slurm get pods -w
When everything is ready, you can login your cluster and submit jobs.
-
Add PubKeys to login node
you can edit `auth.ssh.configmap.perfabPubKeys` in the file chart/values.yaml, adding your public keys to the end Or you can edit `spec.values.auth.ssh.configmap.perfabPubKeys` in your slurmdeployment CRD
-
reapply your chart or CRD
-
login your cluster
kubectl -n slurm exec -it deploy/slurm-login -c login -- bin/bashOr
ssh root@slurm-login.svc.cluster.local