-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support resource reservation (code only) #4887
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Support resource reservation (code only) #4887
Conversation
…group to the scheduling API group.
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Adding label DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Summary of ChangesHello @LegGasai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a foundational resource reservation feature into the scheduler. It allows for a more predictable allocation of resources by enabling jobs to reserve capacity on nodes before their actual tasks are scheduled. The implementation involves new scheduler actions, cache mechanisms, and API extensions to manage the lifecycle of reservations, from initial setup to task binding and cleanup, ensuring that reserved resources are correctly utilized and maintained. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a resource reservation feature, which is a significant addition. The implementation includes new CRDs, scheduler actions, plugins, and cache logic to support reserving resources and then consuming them. The code is well-structured.
I have two main points of feedback:
- The
go.modfile contains areplacedirective pointing to a personal fork. This must be removed before merging to ensure build stability. - In the new
reserveaction, there's an unusedjobspriority queue that seems to be leftover code and should be cleaned up.
Overall, the changes look good, but these points should be addressed.
| // replace volcano.sh/apis => ../volcano-apis | ||
| replace volcano.sh/apis => github.com/LegGasai/apis v0.0.0-20251228060403-bbce8ddfec23 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will revert it after api pr has been merged
| jobs := util.NewPriorityQueue(ssn.JobOrderFn) | ||
| jobs.Push(job) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request implements resource reservation functionality for the Volcano scheduler, enabling two key phases: (1) reservation task scheduling that reserves resources without actual pod binding, and (2) reservation usage where tasks can leverage pre-reserved resources. This is a code-only version that excludes generated YAML files to keep the PR manageable.
Key changes:
- Adds reservation plugin and reserve action for resource reservation lifecycle management
- Implements ReservationCache for tracking and syncing reservation state
- Extends scheduler framework with reservation matching, validation, and allocation logic
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/scheduler/plugins/reservation/reservation.go | New plugin implementing reservation task lifecycle hooks (PreBind, PostBind, validation) |
| pkg/scheduler/cache/reservation_cache.go | New cache for managing reservation state, expiration, and synchronization |
| pkg/scheduler/framework/session.go | Adds reservation matching, owner validation, and spec matching methods |
| pkg/scheduler/framework/statement.go | Adds UnAllocateForReservationTask for resource handover without status change |
| pkg/scheduler/cache/event_handlers.go | Adds reservation event handlers and fake pod creation logic |
| pkg/scheduler/cache/cache.go | Integrates ReservationCache, adds PostBind support, and cleanup workers |
| pkg/scheduler/actions/reserve/reserve.go | New action for swapping reservation tasks with actual tasks |
| pkg/scheduler/api/helpers/helpers.go | Adds IsPodSpecMatch for comparing pod specifications |
| go.mod/go.sum | Updates dependencies (uses personal fork - needs fixing) |
| Makefile | Updates manifest generation to use exclude description flag |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Pass: false, | ||
| Reason: v1beta1.ReservationOwnerNotMatchReason, | ||
| Message: fmt.Sprintf( | ||
| "Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)", |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double space in error message: There are two consecutive spaces in the error message between "ownerObject" and "or". This should be a single space for consistency.
| "Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)", | |
| "Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)", |
| /* | ||
| Copyright 2019 The Kubernetes Authors. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
|
|
||
| package reserve | ||
|
|
||
| import ( | ||
| "time" | ||
|
|
||
| "k8s.io/klog/v2" | ||
|
|
||
| "volcano.sh/volcano/pkg/scheduler/api" | ||
| "volcano.sh/volcano/pkg/scheduler/framework" | ||
| "volcano.sh/volcano/pkg/scheduler/metrics" | ||
| "volcano.sh/volcano/pkg/scheduler/util" | ||
| ) | ||
|
|
||
| type Action struct{} | ||
|
|
||
| func New() *Action { | ||
| return &Action{} | ||
| } | ||
|
|
||
| func (reserve *Action) Name() string { | ||
| return "reserve" | ||
| } | ||
|
|
||
| func (reserve *Action) Initialize() {} | ||
|
|
||
| func (reserve *Action) Execute(ssn *framework.Session) { | ||
| klog.V(5).Infof("Enter Reserve ...") | ||
| defer klog.V(5).Infof("Leaving Reserve ...") | ||
|
|
||
| for _, job := range ssn.Jobs { | ||
| if !job.IsUseReservation() { | ||
| continue | ||
| } | ||
|
|
||
| if vr := ssn.JobValid(job); vr != nil && !vr.Pass { | ||
| klog.V(3).Infof("Job <%s/%s> Queue <%s> skip allocate, reason: %v, message %v", job.Namespace, job.Name, job.Queue, vr.Reason, vr.Message) | ||
| continue | ||
| } | ||
|
|
||
| jobs := util.NewPriorityQueue(ssn.JobOrderFn) | ||
| jobs.Push(job) | ||
|
|
||
| klog.V(3).Infof("Attempting to swap reservation for actual tasks in job <%s/%s>", job.Namespace, job.Name) | ||
| stmt := framework.NewStatement(ssn) | ||
|
|
||
| pendingTasks := util.NewPriorityQueue(ssn.TaskOrderFn) | ||
| for _, task := range job.TaskStatusIndex[api.Pending] { | ||
| pendingTasks.Push(task) | ||
| } | ||
|
|
||
| for !pendingTasks.Empty() { | ||
| actualTask := pendingTasks.Pop().(*api.TaskInfo) | ||
|
|
||
| if job.TaskHasFitErrors(job.DefaultSubJobID(), actualTask) { | ||
| klog.V(3).Infof("Task %s with role spec %s has already predicated failed, skip", actualTask.Name, actualTask.TaskRole) | ||
| continue | ||
| } | ||
|
|
||
| if err := ssn.PrePredicateFn(actualTask); err != nil { | ||
| klog.V(3).Infof("PrePredicate for task %s/%s failed for: %v", actualTask.Namespace, actualTask.Name, err) | ||
| fitErrors := api.NewFitErrors() | ||
| for _, ni := range ssn.NodeList { | ||
| fitErrors.SetNodeError(ni.Name, err) | ||
| } | ||
| job.NodesFitErrors[actualTask.UID] = fitErrors | ||
| break | ||
| } | ||
|
|
||
| reservationTask := actualTask.ReservationTaskInfo | ||
| if reservationTask == nil { | ||
| klog.Warningf("Task <%s/%s> wants to use reservation but has no ReservationTaskInfo", actualTask.Namespace, actualTask.Name) | ||
| continue | ||
| } | ||
| reservedNodeName := reservationTask.NodeName | ||
|
|
||
| if reservedNodeName == "" { | ||
| klog.Warningf("Reservation info for task <%s/%s> does not specify a node", actualTask.Namespace, actualTask.Name) | ||
| continue | ||
| } | ||
|
|
||
| reservedNode, found := ssn.Nodes[reservedNodeName] | ||
| if !found { | ||
| klog.Warningf("Reserved node '%s' for task <%s/%s> not found in current session", reservedNodeName, actualTask.Namespace, actualTask.Name) | ||
| continue | ||
| } | ||
|
|
||
| if err := stmt.UnAllocateForReservationTask(reservationTask); err != nil { | ||
| klog.Errorf("Failed to release reservation task %v resources from node %v, err: %v", reservationTask.UID, reservedNode.Name, err) | ||
| continue | ||
| } | ||
|
|
||
| if err := stmt.Allocate(actualTask, reservedNode); err != nil { | ||
| klog.Errorf("Failed to allocate actual task %v to its reserved node %v, err: %v", actualTask.UID, reservedNode.Name, err) | ||
| } else { | ||
| klog.V(3).Infof("Allocated actual task <%s/%s> to node <%s>, effectively replacing the reservation.", actualTask.Namespace, actualTask.Name, reservedNode.Name) | ||
| metrics.UpdateE2eSchedulingDurationByJob(job.Name, string(job.Queue), job.Namespace, metrics.Duration(job.CreationTimestamp.Time)) | ||
| metrics.UpdateE2eSchedulingLastTimeByJob(job.Name, string(job.Queue), job.Namespace, time.Now()) | ||
| } | ||
|
|
||
| if ssn.JobReady(job) && !pendingTasks.Empty() { | ||
| jobs.Push(job) | ||
| klog.V(3).Infof("Job <%s/%s> is ready, but still has pending tasks. Pipelining.", job.Namespace, job.Name) | ||
| break | ||
| } | ||
| } | ||
|
|
||
| if ssn.JobReady(job) { | ||
| stmt.Commit() | ||
| } else { | ||
| if !ssn.JobPipelined(job) { | ||
| stmt.Discard() | ||
| } else { | ||
| stmt.Commit() | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| func (reserve *Action) UnInitialize() {} |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for reserve action: The reserve action in pkg/scheduler/actions/reserve/reserve.go lacks test coverage. Other actions like allocate have comprehensive test files (allocate_test.go with 5185 lines), but there's no reserve_test.go. Given that this action handles critical reservation-to-actual-task swapping logic, it should have thorough test coverage.
| return err | ||
| } | ||
|
|
||
| // sync cache |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modifying shared cache state without proper synchronization. The line reservation.Reservation = newReservation at line 203 modifies the reservation object that may be concurrently accessed by other goroutines. While the syncReservation method itself may be called from a single worker, the reservation object in the cache can be read by other methods (like MatchReservationForPod) without holding the cache lock. Consider using proper locking or copying mechanisms when updating cached reservation objects.
| // sync cache | |
| // sync cache | |
| rc.Lock() | |
| defer rc.Unlock() |
| for jobTask, resvTask := range matched { | ||
| jobTask.ReservationTaskInfo = resvTask | ||
| } |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side effect in validation function: The CheckReservationMatch function modifies the task state by setting jobTask.ReservationTaskInfo at line 1241. This is a side effect in what appears to be a validation/check function. This could lead to unexpected state modifications during validation and makes the code harder to reason about. Consider separating the validation logic from the state modification, or rename the function to better reflect that it performs both checking and binding.
| job, task, err := rp.session.Cache().FindJobAndTask(taskInfo) | ||
| if err != nil { | ||
| return | ||
| } |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing rollback of task status update: In PreBindRollBack, the function calls SyncReservation but doesn't reverse the task status update that was performed in PreBind at line 169 (job.UpdateTaskStatus(task, api.Bound)). If PreBind succeeds but binding fails later, the task status remains as Bound even though the binding never completed. This could lead to inconsistent state. Consider reverting the task status in the rollback.
| } | |
| } | |
| // Roll back task status that was set to Bound in PreBind. | |
| if err := job.UpdateTaskStatus(task, api.Allocated); err != nil { | |
| klog.Errorf("Failed to rollback status for reservation task %s, err: %v", task.Name, err) | |
| return | |
| } |
| if owner.Object.Name == ownerRef.Name { | ||
| return true |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incomplete owner matching logic: The owner matching at lines 1152-1153 only compares the Name field from owner.Object with ownerRef.Name, but doesn't verify other important fields like Kind, Namespace, or UID. This could lead to false positive matches when different resources happen to have the same name. The comparison should include Kind and potentially UID to ensure accurate matching.
| if owner.Object.Name == ownerRef.Name { | |
| return true | |
| // Match by full owner identity where possible to avoid false positives on name alone. | |
| if owner.Object.Name == ownerRef.Name && | |
| owner.Object.Kind == ownerRef.Kind && | |
| owner.Object.APIVersion == ownerRef.APIVersion { | |
| // If either UID is empty, fall back to name/kind/apiversion match for compatibility. | |
| if string(owner.Object.UID) == "" || string(ownerRef.UID) == "" || owner.Object.UID == ownerRef.UID { | |
| return true | |
| } |
| // Set some default value for matching the pod using reservation | ||
| if pod.Spec.ServiceAccountName == "" { | ||
| pod.Spec.ServiceAccountName = "default" | ||
| } | ||
|
|
||
| if pod.Spec.AutomountServiceAccountToken == nil { | ||
| trueVal := true | ||
| pod.Spec.AutomountServiceAccountToken = &trueVal | ||
| } | ||
|
|
||
| defaultTolerations := []v1.Toleration{ | ||
| { | ||
| Key: "node.kubernetes.io/not-ready", | ||
| Operator: v1.TolerationOpExists, | ||
| Effect: v1.TaintEffectNoExecute, | ||
| TolerationSeconds: intPtr(300), | ||
| }, | ||
| { | ||
| Key: "node.kubernetes.io/unreachable", | ||
| Operator: v1.TolerationOpExists, | ||
| Effect: v1.TaintEffectNoExecute, | ||
| TolerationSeconds: intPtr(300), | ||
| }, | ||
| } | ||
| pod.Spec.Tolerations = mergeTolerations(pod.Spec.Tolerations, defaultTolerations) |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hardcoded default values may cause PodSpec matching issues: The function sets hardcoded defaults for ServiceAccountName (line 1580) and AutomountServiceAccountToken (line 1585), and adds default tolerations (lines 1588-1602). These defaults are applied to reservation pods but may not match the actual pod specifications that will use these reservations. If a real pod has different values (e.g., a different ServiceAccountName or doesn't have these default tolerations), the PodSpec matching in IsPodSpecMatch will fail. Consider making these defaults optional or ensuring they're applied consistently to both reservation and actual pods.
| ) | ||
|
|
||
| // replace volcano.sh/apis => ../volcano-apis | ||
| replace volcano.sh/apis => github.com/LegGasai/apis v0.0.0-20251228060403-bbce8ddfec23 |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dependency replaced with personal GitHub fork: The volcano.sh/apis dependency has been replaced with a personal GitHub fork (github.com/LegGasai/apis). This is inappropriate for production code and should use the official volcano.sh/apis repository. Personal forks may not be maintained, could introduce security issues, and make it difficult for other contributors to build the project. Revert to using the official volcano.sh/apis dependency.
| replace volcano.sh/apis => github.com/LegGasai/apis v0.0.0-20251228060403-bbce8ddfec23 |
| /* | ||
| Copyright 2019 The Volcano Authors. | ||
| Licensed under the Apache License, Version 2.0 (the "License"); | ||
| you may not use this file except in compliance with the License. | ||
| You may obtain a copy of the License at | ||
| http://www.apache.org/licenses/LICENSE-2.0 | ||
| Unless required by applicable law or agreed to in writing, software | ||
| distributed under the License is distributed on an "AS IS" BASIS, | ||
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| See the License for the specific language governing permissions and | ||
| limitations under the License. | ||
| */ | ||
|
|
||
| package reservation | ||
|
|
||
| import ( | ||
| "context" | ||
| "fmt" | ||
|
|
||
| "k8s.io/klog/v2" | ||
| "volcano.sh/apis/pkg/apis/scheduling/v1beta1" | ||
|
|
||
| "volcano.sh/volcano/pkg/scheduler/api" | ||
| "volcano.sh/volcano/pkg/scheduler/cache" | ||
| "volcano.sh/volcano/pkg/scheduler/framework" | ||
| ) | ||
|
|
||
| const ( | ||
| // PluginName indicates name of volcano scheduler plugin. | ||
| PluginName = "reservation" | ||
| ) | ||
|
|
||
| type reservationPlugin struct { | ||
| // Arguments given for the plugin | ||
| session *framework.Session | ||
| } | ||
|
|
||
| // bind context extension information of reservation | ||
| type bindContextExtension struct { | ||
| SkipBind bool | ||
| } | ||
|
|
||
| // New function returns prioritizePlugin object | ||
| func New(arguments framework.Arguments) framework.Plugin { | ||
| return &reservationPlugin{} | ||
| } | ||
|
|
||
| func (rp *reservationPlugin) Name() string { | ||
| return PluginName | ||
| } | ||
|
|
||
| func (rp *reservationPlugin) OnSessionOpen(ssn *framework.Session) { | ||
| klog.V(5).Infof("Enter reservation plugin ...") | ||
|
|
||
| rp.session = ssn | ||
| defer func() { | ||
| klog.V(5).Infof("Leaving reservation plugin...") | ||
| }() | ||
| validJobFn := func(obj interface{}) *api.ValidateResult { | ||
| job, ok := obj.(*api.JobInfo) | ||
| if !ok { | ||
| return &api.ValidateResult{ | ||
| Pass: false, | ||
| Message: fmt.Sprintf("Failed to convert <%v> to *JobInfo", obj), | ||
| } | ||
| } | ||
| // only check job which uses reservation | ||
| if !job.IsUseReservation() { | ||
| return nil | ||
| } | ||
|
|
||
| ssn.MatchReservationForPod(job) | ||
|
|
||
| if valid := ssn.CheckReservationAvailable(job); !valid { | ||
| return &api.ValidateResult{ | ||
| Pass: false, | ||
| Reason: v1beta1.InvalidReservationReason, | ||
| Message: fmt.Sprintf("Reservation specified by job <%s/%s> is not Available", job.Namespace, job.Name), | ||
| } | ||
| } | ||
|
|
||
| if ownerMatched := ssn.CheckReservationOwners(job); !ownerMatched { | ||
| return &api.ValidateResult{ | ||
| Pass: false, | ||
| Reason: v1beta1.ReservationOwnerNotMatchReason, | ||
| Message: fmt.Sprintf( | ||
| "Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)", | ||
| job.Namespace, job.Name, | ||
| ), | ||
| } | ||
| } | ||
|
|
||
| if specMatched := ssn.CheckReservationMatch(job); !specMatched { | ||
| return &api.ValidateResult{ | ||
| Pass: false, | ||
| Reason: v1beta1.ReservationSpecNotMatchReason, | ||
| Message: fmt.Sprintf( | ||
| "Reservation specified by job <%s/%s> does not match job task spec: task count or PodSpec does not match", | ||
| job.Namespace, job.Name, | ||
| ), | ||
| } | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| ssn.AddJobValidFn(rp.Name(), validJobFn) | ||
|
|
||
| bestNodeFn := func(task *api.TaskInfo, scores map[float64][]*api.NodeInfo) *api.NodeInfo { | ||
| if !task.IsReservationTask() { | ||
| return nil | ||
| } | ||
|
|
||
| reservationNodeNames := task.ReservationNodeNames | ||
| if len(reservationNodeNames) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| nodeSet := make(map[string]struct{}) | ||
| for _, nodeList := range scores { | ||
| for _, node := range nodeList { | ||
| nodeSet[node.Name] = struct{}{} | ||
| } | ||
| } | ||
|
|
||
| // match reservation node names specified in given order with available nodes | ||
| for _, reserved := range reservationNodeNames { | ||
| if _, ok := nodeSet[reserved]; ok { | ||
| klog.V(4).Infof("Found reserved node %s for task %s/%s", reserved, task.Namespace, task.Name) | ||
| return ssn.Nodes[reserved] | ||
| } | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
| ssn.AddBestNodeFn(rp.Name(), bestNodeFn) | ||
|
|
||
| ssn.RegisterBinder(rp.Name(), rp) | ||
| } | ||
|
|
||
| func (rp *reservationPlugin) PostBind(ctx context.Context, bindCtx *cache.BindContext) error { | ||
| task := bindCtx.TaskInfo | ||
|
|
||
| if !task.IsUseReservationTask() { | ||
| return nil | ||
| } | ||
|
|
||
| if err := rp.session.Cache().SyncBindToReservationTask(task); err != nil { | ||
| klog.Errorf("Failed to sync task %s to reservation task, err: %v", task.Name, err) | ||
| return err | ||
| } | ||
|
|
||
| return nil | ||
| } | ||
|
|
||
| func (rp *reservationPlugin) PreBind(ctx context.Context, bindCtx *cache.BindContext) error { | ||
| taskInfo := bindCtx.TaskInfo | ||
| if !taskInfo.IsReservationTask() { | ||
| return nil | ||
| } | ||
|
|
||
| job, task, err := rp.session.Cache().FindJobAndTask(taskInfo) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| // reservation task need skipping bind action. | ||
| if err := job.UpdateTaskStatus(task, api.Bound); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| bindCtx.SkipBind = true | ||
| rp.session.Cache().GetReservationCache().SyncReservation(task, job) | ||
| return nil | ||
| } | ||
|
|
||
| func (rp *reservationPlugin) PreBindRollBack(ctx context.Context, bindCtx *cache.BindContext) { | ||
| taskInfo := bindCtx.TaskInfo | ||
|
|
||
| if !taskInfo.IsReservationTask() { | ||
| return | ||
| } | ||
|
|
||
| job, task, err := rp.session.Cache().FindJobAndTask(taskInfo) | ||
| if err != nil { | ||
| return | ||
| } | ||
| rp.session.Cache().GetReservationCache().SyncReservation(task, job) | ||
| return | ||
| } | ||
|
|
||
| func (rp *reservationPlugin) OnSessionClose(ssn *framework.Session) {} |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for reservation plugin: The reservation plugin in pkg/scheduler/plugins/reservation/reservation.go lacks test coverage. Other plugins in the scheduler have corresponding test files, but there's no reservation_test.go file. Given the complexity of the reservation logic (PreBind, PostBind, PreBindRollBack), test coverage is essential to ensure correctness and prevent regressions.
| defer rc.RUnlock() | ||
|
|
||
| for _, reservation := range rc.reservations { | ||
| if isReservationNeedExpiration(reservation, now) { | ||
| onExpired(reservation) | ||
| } | ||
| } |
Copilot
AI
Dec 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential race condition: ScanExpiredReservations holds a read lock while invoking the callback onExpired, which may attempt to modify the reservation cache (e.g., by calling DeleteReservation or enqueueing items). If the callback attempts to acquire a write lock, this could lead to deadlock or other concurrency issues. Consider collecting expired reservations first and then processing them after releasing the lock, or document the constraints on the callback function.
| defer rc.RUnlock() | |
| for _, reservation := range rc.reservations { | |
| if isReservationNeedExpiration(reservation, now) { | |
| onExpired(reservation) | |
| } | |
| } | |
| var expiredReservations []*schedulerapi.ReservationInfo | |
| for _, reservation := range rc.reservations { | |
| if isReservationNeedExpiration(reservation, now) { | |
| expiredReservations = append(expiredReservations, reservation) | |
| } | |
| } | |
| rc.RUnlock() | |
| for _, reservation := range expiredReservations { | |
| onExpired(reservation) | |
| } |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Support resource reservation.
Phase 1: Resource Reservation
The reservation task goes through the complete scheduling logic but does not perform an actual pod bind. Instead, it reserves resources on the target node, updates the
reservation cache, and modifies theReservation custom resource (CR)status.Phase 2: Reservation Scheduling
Tasks that use the reserved resources require special handling during allocate action — they are directly bound to the reserved node. Additionally, the
reservation cacheand theReservation CR statusare updated accordingly.Design docs: #4148
Test/User docs:https://c05j6bovqhh.feishu.cn/docx/EDHLdozMTohoBoxIqWtcwwaqnMf
Note: This PR is the pure code version of #4323. It does not include generated YAML files to avoid excessively large changes in the PR.
Related PR:
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
There are some issues to optimize:
Does this PR introduce a user-facing change?