-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Refactor reclaim action #4794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Refactor reclaim action #4794
Conversation
|
@guoqinwill: The label(s) DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello @guoqinwill, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the reclaim action to correctly handle tasks with a PreemptionPolicy of Never, preventing them from blocking other tasks. The overall change is an improvement and the new test case is well-suited to verify the fix. However, I've found a critical issue in the new loop structure that could lead to an infinite loop, and another bug in how task assignment is tracked within a job. I've also noted a leftover debug log that should be removed. My suggestions aim to fix these new issues while preserving the intended refactoring.
827a55d to
be8390b
Compare
|
/cc |
be8390b to
e3b68f4
Compare
|
Also /assign @hajnalmt mate is also familiar with the reclaim logic |
e4f55b6 to
bcc4023
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great contribution! Thank you @guoqinwill !
The logic behind the action looks flawless.
Following Jesse's comment, I also think the variable names for the two boolean variable assigned and the assignedInJob can be improved, to reflect the intention. (I don't like it in preempt action too.)
Could we rename them to taskPipelined and taskPipelinedInJob for clarity?
These names better reflect that the variables indicate whether a task was pipelined in the loop and within the job, respectively.
bcc4023 to
ec07b90
Compare
| task.UID, n.Name, ssn.UID, rollbackErr) | ||
| } | ||
| } | ||
| assigned = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a reminder that here assigned should be renamed like reclaimed, but I still feel this variable is unuseful in reclaim scenario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a variable named reclaimed at line 213, that's why I suggested taskPipelined for this, but we can rename the other variable to reclaimedResources and name this to reclaimed maybe?
This variable indicates if there was a task pipelined in the function which is I think important to continue the reclaim in the other tasks in the job.
| predicateNodes, _ := predicateHelper.PredicateNodes(task, totalNodes, ssn.PredicateForPreemptAction, false) | ||
| assigned := false | ||
|
|
||
| for _, n := range predicateNodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#3997 Currently, reclaim is done by traversing nodes, instead of first filtering candidates by node and then sorting them. So, I'm wondering if we can fix this issue together? Imagine this worst-case scenario: each job's pods are scattered across different nodes, and then a job wants to reclaim, it might reclaim one pod from each of the other jobs. If we sort by priority, this would make it easier for pods from lower-priority jobs to be reclaimed first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there wasn't any progress in this Issue.
How do you imagine the implementation for this?
Do we want to create a VictimNodeOrderFn implemented in the nodeOrder plugin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the modifications to this issue can be split up for now, and we can focus on refactoring the reclaim process first.
hajnalmt
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my comments too 👍
| task.UID, n.Name, ssn.UID, rollbackErr) | ||
| } | ||
| } | ||
| assigned = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a variable named reclaimed at line 213, that's why I suggested taskPipelined for this, but we can rename the other variable to reclaimedResources and name this to reclaimed maybe?
This variable indicates if there was a task pipelined in the function which is I think important to continue the reclaim in the other tasks in the job.
| "volcano.sh/volcano/pkg/scheduler/util" | ||
| ) | ||
|
|
||
| func init() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you leave the function definition here? Is it called somehwere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because predicateHelper.PredicateNodes was introduced, the unit test needs to initialize options.ServerOpts; otherwise, it will fail at runtime.
| }, | ||
| PodGroups: []*schedulingv1beta1.PodGroup{ | ||
| util.BuildPodGroupWithPrio("pg1", "c1", "q1", 1, nil, schedulingv1beta1.PodGroupInqueue, "mid-priority"), | ||
| util.BuildPodGroupWithPrio("pg1", "c1", "q1", 1, nil, schedulingv1beta1.PodGroupInqueue, "low-priority"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test case was fixed on master please rebase, and don't modify it in your commit directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
| predicateNodes, _ := predicateHelper.PredicateNodes(task, totalNodes, ssn.PredicateForPreemptAction, false) | ||
| assigned := false | ||
|
|
||
| for _, n := range predicateNodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see there wasn't any progress in this Issue.
How do you imagine the implementation for this?
Do we want to create a VictimNodeOrderFn implemented in the nodeOrder plugin?
Signed-off-by: GautamBytes <manchandanigautam@gmail.com> Signed-off-by: guoqinwill <gq411will@163.com>
ec07b90 to
01d921a
Compare
Signed-off-by: guoqinwill <gq411will@163.com>
01d921a to
cdaaa97
Compare
What type of PR is this?
/kind bug
What this PR does / why we need it:
origin pr:#4407
Which issue(s) this PR fixes:
Fixes #3738
Special notes for your reviewer:
Does this PR introduce a user-facing change?