Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split the predicate function to solve the resource filtering problem encountered during preemption #2818

Closed

Conversation

wangyang0616
Copy link
Member

Background:
allocate, preempt, and reclaim all call the predicate to filter nodes. If the predicate has a filter condition for setting whether idle resources are satisfied, then preempt and reclaim will not be able to perform the preemption action.

Solution:
Split the resource-related filter conditions in the predicate into predicateResource for independent judgment. The allocate phase calls predicate and predicateResource to filter nodes, and preempt and reclaim call predicate to filter nodes.

Notice:
Subsequent custom plug-ins also need to comply with the above principles, and the resource filter conditions are uniformly placed in the predicateResource node for processing, otherwise the preemption function may be affected

associate: #2739

@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign hzxuzhonghu after the PR has been reviewed.
You can assign the PR to them by writing /assign @hzxuzhonghu in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 5, 2023
@wangyang0616 wangyang0616 changed the title Feature predicate split Split the predicate function to solve the resource filtering problem encountered during preemption May 5, 2023
@wangyang0616 wangyang0616 force-pushed the feature_predicate_split branch from bb8c128 to 71385c9 Compare May 5, 2023 03:06
task.Namespace, task.Name, node.Name)
return api.NewFitError(task, node, api.NodePodNumberExceeded)
}

if gpuDevice, ok := node.Others[api.GPUSharingDevice].(api.Devices); ok {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether a type assertion is required here?

Copy link

@igormishsky igormishsky May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this issue to address it.
Hope it can be approved and merged as scheduler should not crash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, assertion judgment is required here, the current PR code has already been processed, I should comment on the wrong PR.

@wangyang0616 wangyang0616 force-pushed the feature_predicate_split branch 3 times, most recently from 143d7a6 to df4c199 Compare May 24, 2023 09:58
@wangyang0616
Copy link
Member Author

/priority important-soon

@volcano-sh-bot volcano-sh-bot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label May 24, 2023
…cators, and predicateFn filters inherent properties of nodes

Signed-off-by: wangyang <wangyang8126@gmail.com>
@wangyang0616 wangyang0616 force-pushed the feature_predicate_split branch from df4c199 to a8a888f Compare June 2, 2023 09:36
…ted from predicateFn to predicateResoureFn

Signed-off-by: wangyang <wangyang8126@gmail.com>
@wangyang0616 wangyang0616 force-pushed the feature_predicate_split branch from a8a888f to 2b25c6b Compare June 2, 2023 09:50
@neujie
Copy link

neujie commented Jun 7, 2023

/lgtm

@volcano-sh-bot
Copy link
Contributor

@neujie: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@@ -98,11 +98,16 @@ func (alloc *Action) Execute(ssn *framework.Session) {
allNodes := ssn.NodeList
predicateFn := func(task *api.TaskInfo, node *api.NodeInfo) error {
// Check for Resource Predicate
if ok, reason := task.InitResreq.LessEqualWithReason(node.FutureIdle(), api.Zero); !ok {
return api.NewFitError(task, node, reason)
if err := ssn.PredicateResourceFn(task, node); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PredicateResource is just a resource comparation and do we have to make it as a session API.
It seems there is no need to implement the function with different ways in different plugins.

make it as a session API.

@wangyang0616
Copy link
Member Author

The problem has been dealt with in Predicate adapts allocate and preempt #2916, the current pr display is pending.

/hold

@volcano-sh-bot volcano-sh-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 25, 2023
@wangyang0616
Copy link
Member Author

/close

@volcano-sh-bot
Copy link
Contributor

@wangyang0616: Closed this PR.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants