Skip to content

An extensible capacity signal for InferenceClusters #70

Description

@dennis-upbound

What problem are you facing?

An InferenceCluster reports capacity from its configured pool size, not from what can actually run right now. Different GPU schedulers track real availability in their own state: KAI in its queues and resource pools, Kueue in cluster-queue flavor usage, Volcano in its queues. There is no way to feed any of that into Modelplane's capacity view, and no single shape a reader can rely on without knowing which scheduler produced the numbers. An operator running one of these schedulers cannot make reported capacity reflect what their cluster will actually admit, and anyone who wants a richer signal has nowhere to plug one in.

How could Modelplane help solve your problem?

Modelplane should report capacity in one generic shape, filled by a default source that works on any cluster from kube-native state (node allocatable minus what pods request), and let a scheduler-specific source refine that signal as an opt-in selected on the InferenceCluster. The default and any scheduler-specific source produce the same shape, so whatever reads capacity does not need to know how it was measured.

Modelplane should not carry hard-coded knowledge of each scheduler. The capacity source is something an operator chooses and the project can add to over time, following the same generic-shape-plus-opt-in-adapter pattern as #68. Any scheduling decision that reads capacity is a consumer of this signal, but the signal stands on its own as a cluster's report of what it can actually run.

Metadata

Metadata

Assignees

No one assigned

    Labels

    SchedulingScheduling component

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions