poc(namespacequeue): NamespaceQueue CRD + shadow Queue controller#5320
poc(namespacequeue): NamespaceQueue CRD + shadow Queue controller#5320Aman-Cool wants to merge 3 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
e96cfce to
f37c837
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a NamespaceQueue controller to Volcano, allowing tenants to manage namespace-scoped queues that are automatically reconciled into cluster-scoped shadow queues. The review feedback identifies a critical infinite update loop caused by the controller's drift detection mechanism, which conflicts with other controllers modifying the 'Parent' field. Additionally, the feedback recommends using a lifecycle-aware context instead of context.TODO() and points out that the 'allocated' resource field is missing from the status synchronization logic.
…ntroller Signed-off-by: Aman-Cool <aman017102007@gmail.com>
f37c837 to
88fceef
Compare
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
|
/assign @hajnalmt |
|
Hey @hajnalmt, @JesseStutler and @devzizu, wanted to flag something the bot also caught; the The weight scoping question still feels like the bigger design risk to me though; right now a shadow Queue's weight competes across the entire cluster in the proportion plugin, which means a tenant creating 10 |
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
This is a PoC for my LFX Mentorship 2026 application for the "Support Namespace-scoped Queue in Volcano" project.
The problem:
Queueis cluster-scoped, so a tenant who ownsteam-alphanamespace has to ping a cluster-admin every time they need a queue. This PoC proposes fixing that without touching the scheduler at all.The approach: a namespace-scoped
NamespaceQueueCRD whose controller synthesises a real cluster-scoped shadowQueuefrom it. The scheduler picks it up through the existingQueueinformer ; capacity plugin, proportion plugin, reclaim action, all of it works untouched because as far as the scheduler is concerned it's just anotherQueueinssn.Queues.What's in the PR:
NamespaceQueuetype inscheduling.volcano.sh/v1beta1, spec fields mirrorQueueSpecexactly (capability,guarantee,deserved,weight,reclaimable,priority,parent), registered in scheme alongsideQueueandPodGroupshadowQueueNameuses SHA-256(namespace+"/"+name) prefix to avoid thens="a-b"/name="c"vsns="a"/name="b-c"collision that simple concatenation hasbuildShadowQueuetranslatesNamespaceQueueSpec->QueueSpec, setsscheduling.volcano.sh/nsq-refannotation for GC since cross-namespaceOwnerReferenceis Kubernetes-forbiddenQueue, and status mirroring back to the tenantWhat this PR doesn't include
This is intentionally scoped to demonstrate the approach, not deliver the feature. Missing pieces that come after mentor feedback and codegen:
make manifestsonce kubebuilder markers are validatedmake generate-codereplaces it withfactory.Scheduling().V1beta1().NamespaceQueues()scheduling.volcano.sh/namespacequeue-nameon aPodGroupand patchesspec.queueto the shadow Queue name so the scheduler can resolve it; this is also where namespace isolation is enforcedRole+RoleBindingthat makes the whole self-service story actually work without aClusterRoleBindingStatus.Allocatedto propagate resource usage back toNamespaceQueueStatusso tenants have visibilityNamespaceQueuecreation through job scheduling and resource accountingOpen questions for mentors
1.
updateQueueParentwill overwrite the shadow Queue's parent on every syncIn
queue_controller_action.go,syncQueuecallsupdateQueueParentwhich setsspec.parent = "root"on any queue that has no parent. The shadowQueuetheNamespaceQueuecontroller creates will immediately have its parent overwritten by the existing queue controller on the very next sync cycle. So either I need to always set a parent on the shadowQueueat creation time, orupdateQueueParentneeds to skip queues it doesn't own. What's the right call here ; should the shadowQueuealways be pinned under a cluster-admin-configured parent queue, or is there a way to mark it as "parent already managed"?2. Weight scope in the proportion plugin is cluster-wide, which feels wrong for tenants
The proportion plugin computes
deserved = (weight / Σ all_weights) * total_cluster_resources. A shadow Queue's weight competes with every other queue in the cluster. Ifteam-alphacreates 10NamespaceQueueseach withweight=1, they effectively grab 10 shares of the cluster ; same as 10 cluster-admin-created queues. Should weight forNamespaceQueuesbe scoped relative to the namespace's resource pool, or relative to a parent clusterQueuethey're bound to? I couldn't find a clean answer in the proportion plugin code.3. Do
NamespaceQueuesneed a mandatory parent cluster Queue, or can they be standalone?If standalone, the sum of all
NamespaceQueuecapabilities across the cluster could exceed actual cluster capacity ; the scheduler handles this through pending, but it's uncontrolled overcommit. If we require binding to a parent clusterQueue, an admin still has to pre-create that parent quota queue, which partially defeats the self-service purpose. Is the intent that platform admins pre-create one quotaQueueper namespace/team and tenants subdivide within that, or is a fully standaloneNamespaceQueuevalid?4. The existing
scheduling.volcano.sh/queue-nameannotation onNamespaceconflicts with the new flowThe mutating webhook for
PodGroupalready readsQueueNameAnnotationKeyfrom theNamespaceobject and patchesspec.queueif thePodGroup's queue is"default". If I addNamespaceQueueNameAnnotationKeywith similar semantics, there are now two annotation-based queue assignment mechanisms on aNamespace. Which wins? Can a namespace have both set, pointing to different queues for different workloads?