Skip to content

Scheduler integration via pod annotation propagation (v0.2) #68

Description

@bassam

The problem

v0.1 emits pods with default Kubernetes scheduling. Production GPU infrastructure with multiple teams needs more — fair-share allocation, quotas, gang-scheduling, workload-class differentiation, topology-aware placement at cluster scale. This is what GPU schedulers like KAI, Volcano, and Kueue solve at the cluster level.

Modelplane shouldn't reimplement them. It should compose to whichever scheduler the platform team has chosen, by propagating the right labels, annotations, and schedulerName onto every pod it emits.

What we want to get done

Scheduling concerns split by role:

ML teams pick priority. One field on ModelDeployment referencing a Kubernetes PriorityClass by name. Optional, falls back to cluster default. That's the only scheduling concept ML teams care about.

Platform teams configure everything else. Scheduler choice, default priority, queue and project labels, gang-scheduling annotations — all on InferenceCluster, once per cluster, scheduler-specific. ML teams never see this.

apiVersion: modelplane.ai/v1alpha1
kind: InferenceCluster
metadata:
  name: prod-gke-us-east
spec:
  scheduling:
    schedulerName: kai-scheduler
    defaultPriority: production-standard
    defaultLabels:
      runai/queue: "{{ .Namespace }}"
      runai/project: "{{ .Namespace }}"
    defaultAnnotations:
      runai/scheduling-mode: gang

apiVersion: modelplane.ai/v1alpha1
kind: ModelDeployment
metadata:
  name: kimi-k2
  namespace: ml-team-prod
spec:
  priority: production-critical   # references a Kubernetes PriorityClass; optional
  # ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    SchedulingScheduling componentenhancementNew feature or request

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions