Skip to content

poc(namespacequeue): NamespaceQueue CRD + shadow Queue controller#5320

Draft
Aman-Cool wants to merge 3 commits into
volcano-sh:masterfrom
Aman-Cool:poc/namespace-queue
Draft

poc(namespacequeue): NamespaceQueue CRD + shadow Queue controller#5320
Aman-Cool wants to merge 3 commits into
volcano-sh:masterfrom
Aman-Cool:poc/namespace-queue

Conversation

@Aman-Cool
Copy link
Copy Markdown
Contributor

This is a PoC for my LFX Mentorship 2026 application for the "Support Namespace-scoped Queue in Volcano" project.

The problem: Queue is cluster-scoped, so a tenant who owns team-alpha namespace has to ping a cluster-admin every time they need a queue. This PoC proposes fixing that without touching the scheduler at all.

The approach: a namespace-scoped NamespaceQueue CRD whose controller synthesises a real cluster-scoped shadow Queue from it. The scheduler picks it up through the existing Queue informer ; capacity plugin, proportion plugin, reclaim action, all of it works untouched because as far as the scheduler is concerned it's just another Queue in ssn.Queues.

What's in the PR:

  • NamespaceQueue type in scheduling.volcano.sh/v1beta1, spec fields mirror QueueSpec exactly (capability, guarantee, deserved, weight, reclaimable, priority, parent), registered in scheme alongside Queue and PodGroup
  • shadowQueueName uses SHA-256(namespace+"/"+name) prefix to avoid the ns="a-b"/name="c" vs ns="a"/name="b-c" collision that simple concatenation has
  • buildShadowQueue translates NamespaceQueueSpec -> QueueSpec, sets scheduling.volcano.sh/nsq-ref annotation for GC since cross-namespace OwnerReference is Kubernetes-forbidden
  • Full controller using a dynamic informer (pre-codegen) with reconcile, create/update/delete shadow Queue, and status mirroring back to the tenant
  • 12 passing unit tests

What this PR doesn't include

This is intentionally scoped to demonstrate the approach, not deliver the feature. Missing pieces that come after mentor feedback and codegen:

  • CRD manifest : generated by make manifests once kubebuilder markers are validated
  • Generated lister/informer : the dynamic informer in the controller is a pre-codegen placeholder; make generate-code replaces it with factory.Scheduling().V1beta1().NamespaceQueues()
  • Admission webhook : the piece that reads scheduling.volcano.sh/namespacequeue-name on a PodGroup and patches spec.queue to the shadow Queue name so the scheduler can resolve it; this is also where namespace isolation is enforced
  • RBAC manifests : the tenant Role + RoleBinding that makes the whole self-service story actually work without a ClusterRoleBinding
  • Status mirroring : a watch on the shadow Queue's Status.Allocated to propagate resource usage back to NamespaceQueueStatus so tenants have visibility
  • E2E tests : full flow from NamespaceQueue creation through job scheduling and resource accounting

Open questions for mentors

1. updateQueueParent will overwrite the shadow Queue's parent on every sync

In queue_controller_action.go, syncQueue calls updateQueueParent which sets spec.parent = "root" on any queue that has no parent. The shadow Queue the NamespaceQueue controller creates will immediately have its parent overwritten by the existing queue controller on the very next sync cycle. So either I need to always set a parent on the shadow Queue at creation time, or updateQueueParent needs to skip queues it doesn't own. What's the right call here ; should the shadow Queue always be pinned under a cluster-admin-configured parent queue, or is there a way to mark it as "parent already managed"?

2. Weight scope in the proportion plugin is cluster-wide, which feels wrong for tenants

The proportion plugin computes deserved = (weight / Σ all_weights) * total_cluster_resources. A shadow Queue's weight competes with every other queue in the cluster. If team-alpha creates 10 NamespaceQueues each with weight=1, they effectively grab 10 shares of the cluster ; same as 10 cluster-admin-created queues. Should weight for NamespaceQueues be scoped relative to the namespace's resource pool, or relative to a parent cluster Queue they're bound to? I couldn't find a clean answer in the proportion plugin code.

3. Do NamespaceQueues need a mandatory parent cluster Queue, or can they be standalone?

If standalone, the sum of all NamespaceQueue capabilities across the cluster could exceed actual cluster capacity ; the scheduler handles this through pending, but it's uncontrolled overcommit. If we require binding to a parent cluster Queue, an admin still has to pre-create that parent quota queue, which partially defeats the self-service purpose. Is the intent that platform admins pre-create one quota Queue per namespace/team and tenants subdivide within that, or is a fully standalone NamespaceQueue valid?

4. The existing scheduling.volcano.sh/queue-name annotation on Namespace conflicts with the new flow

The mutating webhook for PodGroup already reads QueueNameAnnotationKey from the Namespace object and patches spec.queue if the PodGroup's queue is "default". If I add NamespaceQueueNameAnnotationKey with similar semantics, there are now two annotation-based queue assignment mechanisms on a Namespace. Which wins? Can a namespace have both set, pointing to different queues for different workloads?

@volcano-sh-bot volcano-sh-bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 14, 2026
@volcano-sh-bot volcano-sh-bot requested review from hwdef and merryzhou May 14, 2026 19:32
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lowang-bh for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. do-not-merge/contains-merge-commits labels May 14, 2026
@Aman-Cool Aman-Cool force-pushed the poc/namespace-queue branch from e96cfce to f37c837 Compare May 14, 2026 19:33
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a NamespaceQueue controller to Volcano, allowing tenants to manage namespace-scoped queues that are automatically reconciled into cluster-scoped shadow queues. The review feedback identifies a critical infinite update loop caused by the controller's drift detection mechanism, which conflicts with other controllers modifying the 'Parent' field. Additionally, the feedback recommends using a lifecycle-aware context instead of context.TODO() and points out that the 'allocated' resource field is missing from the status synchronization logic.

Comment thread pkg/controllers/namespacequeue/namespacequeue_controller.go
Comment thread pkg/controllers/namespacequeue/shadow_queue.go
Comment thread pkg/controllers/namespacequeue/namespacequeue_controller.go Outdated
Comment thread pkg/controllers/namespacequeue/namespacequeue_controller.go
…ntroller

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
@Aman-Cool Aman-Cool force-pushed the poc/namespace-queue branch from f37c837 to 88fceef Compare May 14, 2026 19:34
Signed-off-by: Aman-Cool <aman017102007@gmail.com>
@Aman-Cool
Copy link
Copy Markdown
Contributor Author

/assign @hajnalmt

@Aman-Cool
Copy link
Copy Markdown
Contributor Author

Hey @hajnalmt, @JesseStutler and @devzizu, wanted to flag something the bot also caught; the reflect.DeepEqual on the full spec was exactly the updateQueueParent conflict I raised in the open questions. It confirmed that the shadow Queue's parent field can't just be left empty; the existing queue controller will keep stamping "root" on it every sync cycle, and we'd fight each other forever. Fixed it by comparing only the fields we explicitly own.

The weight scoping question still feels like the bigger design risk to me though; right now a shadow Queue's weight competes across the entire cluster in the proportion plugin, which means a tenant creating 10 NamespaceQueues with weight=1 grabs 10 cluster shares.
Would love your take on whether the intent is to always bind these under a parent cluster Queue (so weight is scoped within that pool), or if standalone NamespaceQueues are a valid use case at all.

Signed-off-by: Aman-Cool <aman017102007@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants