Skip to content
This repository was archived by the owner on Oct 21, 2025. It is now read-only.

AaronYang0628/slurm-on-k8s

Repository files navigation

Artifact Hub

Prerequisites

  • kubectl version v1.11.3+.
  • buildah version v1.33.10+
  • Access to a Kubernetes v1.11.3+ cluster.

Features

Slurm on Kubernetes provides the following features:

  • Resource Management: Efficiently manages resources in a Kubernetes cluster, ensuring optimal utilization.
  • Job Scheduling: Advanced job scheduling capabilities to handle various types of workloads.
  • Scalability: Easily scales to accommodate growing workloads and resources.
  • High Availability: Supports high availability configurations to ensure continuous operation.
  • Multi-User Support: Allows multiple users to submit and manage their jobs concurrently.
  • Integration with MPI Libraries: Supports both Open MPI and Intel MPI libraries for parallel computing.
  • Customizable: Using values.yaml file, you can customizable a slurm cluster, fitting specific needs and configurations.
  • Separated munged daemon
  • Support GPU nodes deployment
  • Running on Cgroup v1/v2

Usage

if you wanna change slurm configuration ,please check slurm configuration generator, check link

  • for github helm user
    1. get helm repo and update
      helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
      
    2. install slurm chart
      helm install slurm ay-helm-mirror/chart -f charts/values.yaml --version 1.0.10
      
  • for artifact helm user
    1. get helm repo and update
      helm repo add ay-helm-mirror https://aaronyang0628.github.io/helm-chart-mirror/charts
      
    2. install slurm chart
      helm install slurm ay-helm-mirror/chart -f charts/values.yaml --version 1.0.10
    Or you can get template values.yaml from link
  • for opertaor user
    1. test pull an image and apply
      podman pull ghcr.io/aaronyang0628/slurm-operator:25.05
      
    2. deploy slurm operator
      kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/slurm-on-k8s/refs/heads/main/operator/dist/install.yaml
      
    3. apply CRD slurmdeployment
      kubectl apply -f https://raw.githubusercontent.com/AaronYang0628/helm-chart-mirror/refs/heads/main/templates/slurm/slurmdeployment.zj.values.yaml
      

Manage Your Slurm Cluster

  • check cluster status
    kubectl get slurmdep slurmdeployment-sample
    kubectl -n slurm get pods -w

When everything is ready, you can login your cluster and submit jobs.

  • Add PubKeys to login node

    you can edit `auth.ssh.configmap.perfabPubKeys` in the file chart/values.yaml, adding your public keys to the end 
    Or you can edit `spec.values.auth.ssh.configmap.perfabPubKeys` in your slurmdeployment CRD
  • reapply your chart or CRD

  • login your cluster

    kubectl -n slurm exec -it deploy/slurm-login -c login -- bin/bash

    Or

    ssh root@slurm-login.svc.cluster.local