The problem
v0.1 emits pods with default Kubernetes scheduling. Production GPU infrastructure with multiple teams needs more — fair-share allocation, quotas, gang-scheduling, workload-class differentiation, topology-aware placement at cluster scale. This is what GPU schedulers like KAI, Volcano, and Kueue solve at the cluster level.
Modelplane shouldn't reimplement them. It should compose to whichever scheduler the platform team has chosen, by propagating the right labels, annotations, and schedulerName onto every pod it emits.
What we want to get done
Scheduling concerns split by role:
ML teams pick priority. One field on ModelDeployment referencing a Kubernetes PriorityClass by name. Optional, falls back to cluster default. That's the only scheduling concept ML teams care about.
Platform teams configure everything else. Scheduler choice, default priority, queue and project labels, gang-scheduling annotations — all on InferenceCluster, once per cluster, scheduler-specific. ML teams never see this.
apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
name: prod-gke-us-east
spec:
scheduling:
schedulerName: kai-scheduler
defaultPriority: production-standard
defaultLabels:
runai/queue: "{{ .Namespace }}"
runai/project: "{{ .Namespace }}"
defaultAnnotations:
runai/scheduling-mode: gang
apiVersion: modelplane.ai/v1alpha1
kind: ModelDeployment
metadata:
name: kimi-k2
namespace: ml-team-prod
spec:
priority: production-critical # references a Kubernetes PriorityClass; optional
# ...
The problem
v0.1 emits pods with default Kubernetes scheduling. Production GPU infrastructure with multiple teams needs more — fair-share allocation, quotas, gang-scheduling, workload-class differentiation, topology-aware placement at cluster scale. This is what GPU schedulers like KAI, Volcano, and Kueue solve at the cluster level.
Modelplane shouldn't reimplement them. It should compose to whichever scheduler the platform team has chosen, by propagating the right labels, annotations, and
schedulerNameonto every pod it emits.What we want to get done
Scheduling concerns split by role:
ML teams pick priority. One field on ModelDeployment referencing a Kubernetes PriorityClass by name. Optional, falls back to cluster default. That's the only scheduling concept ML teams care about.
Platform teams configure everything else. Scheduler choice, default priority, queue and project labels, gang-scheduling annotations — all on InferenceCluster, once per cluster, scheduler-specific. ML teams never see this.