Skip to content

Conversation

@hajnalmt
Copy link
Contributor

@hajnalmt hajnalmt commented Dec 17, 2025

What type of PR is this?

/kind bug
/kind enhancement

What this PR does / why we need it:

The PR contains some enhancements and a bugfix regarding the capacity plugin's enqueue process.
Currently in the capacity plugin we rely purely on realCapability to decide wheter a job shall enqueue or not.

This is a little bit problematic, as there are cases when we want to enqueue below deserved or guarantee directly as
zjj2wry points out here: #4817 (comment)
For example, when we have a queue structure where:

# Cluster: 100 CPU
parent (guarantee = 100 CPU, capability = 100 CPU)
├── leaf-1:
│   - guarantee = 60 CPU
│   - deserved = 60 CPU
│   - allocated = 100 CPU (overused by 40 CPU)
│
└── leaf-2:
    - guarantee = 40 CPU
    - deserved = 40 CPU
    - allocated = 0 CPU
    - wants 40 CPU

The leaf-2 queue cannot enqueue, although having guaranteed resources on the cluster.
There can be use cases where we don't guarantee resources on the cluster to any queues, rather than we are slicing the cluster on deserved values, for example if we omit guarantee in the above example:

# Cluster: 100 CPU
parent (guarantee = 100 CPU, capability = 100 CPU)
├── leaf-1:
│   - deserved = 60 CPU
│   - allocated = 100 CPU (overused by 40 CPU)
│
└── leaf-2:
    - deserved = 40 CPU
    - allocated = 0 CPU
    - wants 40 CPU

leaf-2 in this case will not able to enqueue too due to maxed out cluster capacity, although it would deserve 40
For this use case (when no guarantee is specified) we should enqueue too I think.

The comparision uses LessEqualPartlyWithDimensionZeroFiltered so if there exists any dimension that is below or equal
in deserved/guarantee we still enqueue. This means in theory that there can be dimensions where the job's minResources will increase the allocated+ inqueued queue resources above the queue capability value, which is fine I think as this is the realm of allocate rather than enqueue, so the principle that we don't let resources to be created in a queue above the capability is not harmed.

Additionally, these examples are for hierarchical queues, but the causes and the induction is true for the non-hierarchical cases too.

Minor enhancement in the enqueue action logging and the capacity plugins tests has been added to the PR.

Which issue(s) this PR fixes:

Fixes #4817

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix(enqueue): capacity plugin job enqueueable
A bug has been fixed in the capacity plugin's enqueue logic, which prohibited legitimate jobs to move from `Pending` to `Inqueue`.
Additionally we added a new logic which will let the jobs to enqueue below `deserved`, when no `guarantee` value is provided in any of the queues across cluster.

Adding a log line to the enqueue action with the maximum verbosity
to improve traceability and make the enqueue process easier to follow.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 17, 2025
@volcano-sh-bot
Copy link
Contributor

@hajnalmt: The label(s) kind/enhancement cannot be applied, because the repository doesn't have them.

Details

In response to this:

What type of PR is this?

/kind bug
/kind enhancement

What this PR does / why we need it:

The PR contains some enhancements and a bugfix regarding the capacity plugins enqueue process.
Currently in the capacity plugin we are relying on realCapability to decide wheter a job shall enqueued or not.

This is a little bit problematic, as there are cases when we want to enqueue below deserved or guarantee directly as
zjj2wry points out here: #4817 (comment)
For example, when we have a queue structure where:

# Cluster: 100 CPU
parent (guarantee = 100 CPU, capability = 100 CPU)
├── leaf-1:
│   - guarantee = 60 CPU
│   - deserved = 60 CPU
│   - allocated = 100 CPU (overused by 40 CPU)
│
└── leaf-2:
   - guarantee = 40 CPU
   - deserved = 40 CPU
   - allocated = 0 CPU
   - wants 40 CPU

The leaf-2 queue cannot enqueue, although having guaranteed resources on the cluster.
There can be use cases where we don't guarantee resources on the cluster to any queues, rather than we rely only on deserved values, if we omit guarantee in the above example:

# Cluster: 100 CPU
parent (guarantee = 100 CPU, capability = 100 CPU)
├── leaf-1:
│   - deserved = 60 CPU
│   - allocated = 100 CPU (overused by 40 CPU)
│
└── leaf-2:
   - deserved = 40 CPU
   - allocated = 0 CPU
   - wants 40 CPU

leaf-2 in this case will not able to enqueue too due to maxed out cluster capacity, although it would deserve 40
For this use case (when no guarantee is specified) we should enqueue too I think.

The comparision uses LessEqualPartlyWithDimensionZeroFiltered so if there exists any dimension that is below or equal
in deserved/guarantee we still enqueue. This means in theory that there can be dimensions where the job's minResources will increase the inqueued queue resources above the queue capability value, which is fine I think as this is the realm of allocate rather than enqueue, so the principle that we don't let resources to be created in a queue above the capability is not harmed.

Additionally, these examples are for hierarchical queues, but the causes and the induction is true for the non-hierarchical cases too.

Which issue(s) this PR fixes:

Fixes #4817

Special notes for your reviewer:

Does this PR introduce a user-facing change?

fix(enqueue): capacity plugin job enqueueable
A bug has been fixed in the capacity plugin's enqueue logic, which prohibited legitimate jobs to move from `Pending` to `Inqueue`. Additionally we added a logic which will let the jobs to enqueue, when no `guarantee` value is provided in any of the queues across cluster.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Dec 17, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @hajnalmt, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the job enqueueing process within the capacity plugin by resolving a bug that prevented legitimate jobs from being enqueued under certain resource allocation conditions. It introduces more flexible resource evaluation, allowing jobs to progress when their requested resources fall within a queue's guaranteed or deserved limits, even if the overall capacity is temporarily oversubscribed. This enhancement ensures more efficient utilization of cluster resources and a smoother workflow for jobs in various queue configurations.

Highlights

  • Enhanced Job Enqueueing Logic: The capacity plugin's job enqueueing mechanism has been updated to allow jobs to be enqueued based on "guarantee" or "deserved" resources, even if the "realCapability" is currently exceeded. This addresses scenarios where jobs were incorrectly blocked from moving from "Pending" to "Inqueue".
  • Flexible Resource Evaluation: The "jobEnqueueable" function now uses "LessEqualPartlyWithDimensionZeroFiltered" to evaluate resource availability, meaning a job can be enqueued if it fits within "guarantee" or "deserved" for at least one resource dimension, preventing unnecessary blocking.
  • Improved Hierarchical Queue Handling: The enqueueing logic has been refined for hierarchical queues, ensuring that checks against "guarantee", "deserved", and "realCapability" are correctly applied across the queue hierarchy.
  • Comprehensive Test Coverage: New test cases have been added to validate the updated enqueueing behavior, specifically covering scenarios involving "guarantee" and "deserved" resources, both when enqueuing is expected and when it should be prevented due to cluster saturation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable fix and enhancement to the capacity plugin's enqueue logic, allowing jobs to be enqueued based on guarantee or deserved resources, which resolves issues in hierarchical queue scenarios. The changes are well-supported by a comprehensive set of new test cases that cover the described scenarios. My review includes a suggestion to refactor a part of the new logic to improve maintainability by reducing code duplication.

@hajnalmt hajnalmt force-pushed the fix/capacity-plugin-job-enqueueable branch 2 times, most recently from 52b26f4 to 7d1fd4e Compare December 17, 2025 12:18
Enhancing the capacity plugin enqueue-ing process. When there is a
single queue that defines guarantee on the cluster we allow enqueue
below guarantee for the incoming tasks. The "below" here means that
if there is a single resource dimension the task requests which with we
are still equal or less in guarantee then we should enqueue the task.

This means in theory that there can be dimensions where the task will be
above the queue capability, but we still enqueue. This case is the
matter of the allocate preempt and reclaim action I think.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
New unit tests in the capacity plugin which are verifying the
below deserved/guarantee enqueue process.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
Moving the test functions in the capacity plugins to the common test
utils, so they can be reused elsewhere too.

Signed-off-by: Hajnal Máté <hajnalmt@gmail.com>
@hajnalmt hajnalmt force-pushed the fix/capacity-plugin-job-enqueueable branch from 7d1fd4e to 58e0dce Compare December 17, 2025 12:32
Copy link

@zjj2wry zjj2wry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the only concern is excessive enqueueing when GPU is insufficient but CPU is sufficient. However, I can configure child queues with GPU guarantee only

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: zjj2wry
Once this PR has been reviewed and has the lgtm label, please assign jessestutler for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JesseStutler
Copy link
Member

JesseStutler commented Dec 22, 2025

I'm wondering that in this case:

# Cluster: 100 CPU
parent (guarantee = 100 CPU, capability = 100 CPU)
├── leaf-1:
│   - guarantee = 60 CPU
│   - deserved = 60 CPU
│   - allocated = 100 CPU (overused by 40 CPU)
│
└── leaf-2:
    - guarantee = 40 CPU
    - deserved = 40 CPU
    - allocated = 0 CPU
    - wants 40 CPU

Why leaf-1 can allocate 100CPU? Because leaf-2 has 40CPU guarantee resources, leaf-1's realCapability should be 60CPU, why leaf-1 can allocate 100CPU?

BTW, I think the case is very weird, we prefer to comply guarantee < deserved < capability, don't know we have such a real scenario totalGuarantee == totalResource

@zjj2wry
Copy link

zjj2wry commented Dec 23, 2025

I'm wondering that in this case:

# Cluster: 100 CPU
parent (guarantee = 100 CPU, capability = 100 CPU)
├── leaf-1:
│   - guarantee = 60 CPU
│   - deserved = 60 CPU
│   - allocated = 100 CPU (overused by 40 CPU)
│
└── leaf-2:
    - guarantee = 40 CPU
    - deserved = 40 CPU
    - allocated = 0 CPU
    - wants 40 CPU

Why leaf-1 can allocate 100CPU? Because leaf-2 has 40CPU guarantee resources, leaf-1's realCapability should be 60CPU, why leaf-1 can allocate 100CPU?

BTW, I think the case is very weird, we prefer to comply guarantee < deserved < capability, don't know we have such a real scenario totalGuarantee == totalResource

The scenario where leaf-2 is created after leaf-1 has already allocated resources. when cluster resources are insufficient and new nodes are added through scaling, existing pending tasks may immediately consume all the newly added resources, preventing new queues from scheduling pods even if they have guaranteed resources.

@zjj2wry
Copy link

zjj2wry commented Dec 28, 2025

the job will always enqueue like below. i think we should checks all dimensions of resources when determining if a job can be enqueue and preempt.

guarantee: CPU=100, Memory=200Gi, GPU=8
used: CPU=0, Memory=0, GPU=8 (GPU exhausted)
request: CPU=10, Memory=20Gi, GPU=2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hierarchical Queue Mode Prevents Reclaim Functionality

4 participants