-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Refactor reclaim action to process all tasks in a job #4407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Refactor reclaim action to process all tasks in a job #4407
Conversation
Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
|
@GautamBytes: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
Signed-off-by: GautamBytes <manchandanigautam@gmail.com>
|
/assign @JesseStutler @JesseStutler , can you help me figure out why this |
|
please split pr into different prs or commits, each pr/commits implement only one function, such as refact, bugfix. |
|
/cc |
|
@JesseStutler I've opened a pull request to solve this issue at #4634 |
|
@GautamBytes: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@GautamBytes Hi, are you still working on this? You need to rebase the latest code and fix all the CIs |
|
@JesseStutler sorry , i got an internship and currently busy there . Would be great if anyone inherit my pr or asks for commit access and gets it merged! |
What type of PR is this?
/kind improvement /kind bug
What this PR does / why we need it:
This PR fixes a critical bug in the scheduler's reclaim action where an entire queue could be incorrectly skipped during preemption.
The previous logic would evaluate only the first task of a starving job. If that single task failed a preliminary check (e.g., its preemptionPolicy was set to Never), the scheduler would discard the entire queue for that cycle. This prevented other valid, preemptable tasks in the same job, or other jobs in the same queue, from ever being considered for reclamation.
To fix this, the reclaim action's main loop has been refactored to mirror the robust nested queue -> job -> task structure found in the allocate action. This ensures:
Which issue(s) this PR fixes:
Fixes #3738
Special notes for your reviewer:
The core of this change is refactoring the reclaim action's main loop to align with the existing, proven pattern in the allocate action. This not only fixes the bug but also improves code consistency and performance. A new unit test has been added to specifically cover this failure scenario.
Does this PR introduce a user-facing change?