Skip to content

Conversation

@LegGasai
Copy link

What type of PR is this?

/kind feature

What this PR does / why we need it:

Support resource reservation.
Phase 1: Resource Reservation
The reservation task goes through the complete scheduling logic but does not perform an actual pod bind. Instead, it reserves resources on the target node, updates the reservation cache, and modifies the Reservation custom resource (CR) status.

Phase 2: Reservation Scheduling
Tasks that use the reserved resources require special handling during allocate action — they are directly bound to the reserved node. Additionally, the reservation cache and the Reservation CR status are updated accordingly.

Design docs: #4148
Test/User docs:https://c05j6bovqhh.feishu.cn/docx/EDHLdozMTohoBoxIqWtcwwaqnMf

Note: This PR is the pure code version of #4323. It does not include generated YAML files to avoid excessively large changes in the PR.

Related PR:

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

There are some issues to optimize:

Does this PR introduce a user-facing change?


LegGasai added 30 commits May 27, 2025 23:38
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Signed-off-by: leggasai <jyc132780208@sina.com>
Copilot AI review requested due to automatic review settings December 28, 2025 13:17
@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Dec 28, 2025
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign monokaix for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@volcano-sh-bot
Copy link
Contributor

Adding label do-not-merge/contains-merge-commits because PR contains merge commits, which are not allowed in this repository.
Use git rebase to reapply your commits on top of the target branch. Detailed instructions for doing so can be found here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@volcano-sh-bot volcano-sh-bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 28, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @LegGasai, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a foundational resource reservation feature into the scheduler. It allows for a more predictable allocation of resources by enabling jobs to reserve capacity on nodes before their actual tasks are scheduled. The implementation involves new scheduler actions, cache mechanisms, and API extensions to manage the lifecycle of reservations, from initial setup to task binding and cleanup, ensuring that reserved resources are correctly utilized and maintained.

Highlights

  • Resource Reservation Feature: Introduced a new resource reservation mechanism, enabling pre-allocation of resources on nodes for specific tasks.
  • Two-Phase Scheduling: Implemented a two-phase scheduling approach: first, resources are reserved on a target node, and then tasks requiring these reserved resources are directly bound to that node.
  • Reservation Cache and CR Status: Added a dedicated reservation cache and logic to update the status of Reservation Custom Resources (CRs) throughout the reservation lifecycle.
  • Code-Only Implementation: This pull request focuses solely on the code implementation, intentionally excluding generated YAML files to manage the size of the change.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a resource reservation feature, which is a significant addition. The implementation includes new CRDs, scheduler actions, plugins, and cache logic to support reserving resources and then consuming them. The code is well-structured.

I have two main points of feedback:

  1. The go.mod file contains a replace directive pointing to a personal fork. This must be removed before merging to ensure build stability.
  2. In the new reserve action, there's an unused jobs priority queue that seems to be leftover code and should be cleaned up.

Overall, the changes look good, but these points should be addressed.

Comment on lines +219 to +220
// replace volcano.sh/apis => ../volcano-apis
replace volcano.sh/apis => github.com/LegGasai/apis v0.0.0-20251228060403-bbce8ddfec23

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This replace directive points to a personal fork. While this is useful for development, it should be removed before merging to avoid breaking the main branch build for other contributors and CI/CD pipelines. Please ensure this is reverted.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will revert it after api pr has been merged

Comment on lines +56 to +57
jobs := util.NewPriorityQueue(ssn.JobOrderFn)
jobs.Push(job)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The jobs priority queue is initialized here for each job but it's never read from. This appears to be dead code. The jobs.Push(job) on line 117 also pushes to this unused queue. This logic should be removed to improve clarity and avoid confusion.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request implements resource reservation functionality for the Volcano scheduler, enabling two key phases: (1) reservation task scheduling that reserves resources without actual pod binding, and (2) reservation usage where tasks can leverage pre-reserved resources. This is a code-only version that excludes generated YAML files to keep the PR manageable.

Key changes:

  • Adds reservation plugin and reserve action for resource reservation lifecycle management
  • Implements ReservationCache for tracking and syncing reservation state
  • Extends scheduler framework with reservation matching, validation, and allocation logic

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
pkg/scheduler/plugins/reservation/reservation.go New plugin implementing reservation task lifecycle hooks (PreBind, PostBind, validation)
pkg/scheduler/cache/reservation_cache.go New cache for managing reservation state, expiration, and synchronization
pkg/scheduler/framework/session.go Adds reservation matching, owner validation, and spec matching methods
pkg/scheduler/framework/statement.go Adds UnAllocateForReservationTask for resource handover without status change
pkg/scheduler/cache/event_handlers.go Adds reservation event handlers and fake pod creation logic
pkg/scheduler/cache/cache.go Integrates ReservationCache, adds PostBind support, and cleanup workers
pkg/scheduler/actions/reserve/reserve.go New action for swapping reservation tasks with actual tasks
pkg/scheduler/api/helpers/helpers.go Adds IsPodSpecMatch for comparing pod specifications
go.mod/go.sum Updates dependencies (uses personal fork - needs fixing)
Makefile Updates manifest generation to use exclude description flag

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Pass: false,
Reason: v1beta1.ReservationOwnerNotMatchReason,
Message: fmt.Sprintf(
"Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)",
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double space in error message: There are two consecutive spaces in the error message between "ownerObject" and "or". This should be a single space for consistency.

Suggested change
"Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)",
"Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)",

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +135
/*
Copyright 2019 The Kubernetes Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package reserve

import (
"time"

"k8s.io/klog/v2"

"volcano.sh/volcano/pkg/scheduler/api"
"volcano.sh/volcano/pkg/scheduler/framework"
"volcano.sh/volcano/pkg/scheduler/metrics"
"volcano.sh/volcano/pkg/scheduler/util"
)

type Action struct{}

func New() *Action {
return &Action{}
}

func (reserve *Action) Name() string {
return "reserve"
}

func (reserve *Action) Initialize() {}

func (reserve *Action) Execute(ssn *framework.Session) {
klog.V(5).Infof("Enter Reserve ...")
defer klog.V(5).Infof("Leaving Reserve ...")

for _, job := range ssn.Jobs {
if !job.IsUseReservation() {
continue
}

if vr := ssn.JobValid(job); vr != nil && !vr.Pass {
klog.V(3).Infof("Job <%s/%s> Queue <%s> skip allocate, reason: %v, message %v", job.Namespace, job.Name, job.Queue, vr.Reason, vr.Message)
continue
}

jobs := util.NewPriorityQueue(ssn.JobOrderFn)
jobs.Push(job)

klog.V(3).Infof("Attempting to swap reservation for actual tasks in job <%s/%s>", job.Namespace, job.Name)
stmt := framework.NewStatement(ssn)

pendingTasks := util.NewPriorityQueue(ssn.TaskOrderFn)
for _, task := range job.TaskStatusIndex[api.Pending] {
pendingTasks.Push(task)
}

for !pendingTasks.Empty() {
actualTask := pendingTasks.Pop().(*api.TaskInfo)

if job.TaskHasFitErrors(job.DefaultSubJobID(), actualTask) {
klog.V(3).Infof("Task %s with role spec %s has already predicated failed, skip", actualTask.Name, actualTask.TaskRole)
continue
}

if err := ssn.PrePredicateFn(actualTask); err != nil {
klog.V(3).Infof("PrePredicate for task %s/%s failed for: %v", actualTask.Namespace, actualTask.Name, err)
fitErrors := api.NewFitErrors()
for _, ni := range ssn.NodeList {
fitErrors.SetNodeError(ni.Name, err)
}
job.NodesFitErrors[actualTask.UID] = fitErrors
break
}

reservationTask := actualTask.ReservationTaskInfo
if reservationTask == nil {
klog.Warningf("Task <%s/%s> wants to use reservation but has no ReservationTaskInfo", actualTask.Namespace, actualTask.Name)
continue
}
reservedNodeName := reservationTask.NodeName

if reservedNodeName == "" {
klog.Warningf("Reservation info for task <%s/%s> does not specify a node", actualTask.Namespace, actualTask.Name)
continue
}

reservedNode, found := ssn.Nodes[reservedNodeName]
if !found {
klog.Warningf("Reserved node '%s' for task <%s/%s> not found in current session", reservedNodeName, actualTask.Namespace, actualTask.Name)
continue
}

if err := stmt.UnAllocateForReservationTask(reservationTask); err != nil {
klog.Errorf("Failed to release reservation task %v resources from node %v, err: %v", reservationTask.UID, reservedNode.Name, err)
continue
}

if err := stmt.Allocate(actualTask, reservedNode); err != nil {
klog.Errorf("Failed to allocate actual task %v to its reserved node %v, err: %v", actualTask.UID, reservedNode.Name, err)
} else {
klog.V(3).Infof("Allocated actual task <%s/%s> to node <%s>, effectively replacing the reservation.", actualTask.Namespace, actualTask.Name, reservedNode.Name)
metrics.UpdateE2eSchedulingDurationByJob(job.Name, string(job.Queue), job.Namespace, metrics.Duration(job.CreationTimestamp.Time))
metrics.UpdateE2eSchedulingLastTimeByJob(job.Name, string(job.Queue), job.Namespace, time.Now())
}

if ssn.JobReady(job) && !pendingTasks.Empty() {
jobs.Push(job)
klog.V(3).Infof("Job <%s/%s> is ready, but still has pending tasks. Pipelining.", job.Namespace, job.Name)
break
}
}

if ssn.JobReady(job) {
stmt.Commit()
} else {
if !ssn.JobPipelined(job) {
stmt.Discard()
} else {
stmt.Commit()
}
}
}
}

func (reserve *Action) UnInitialize() {}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for reserve action: The reserve action in pkg/scheduler/actions/reserve/reserve.go lacks test coverage. Other actions like allocate have comprehensive test files (allocate_test.go with 5185 lines), but there's no reserve_test.go. Given that this action handles critical reservation-to-actual-task swapping logic, it should have thorough test coverage.

Copilot uses AI. Check for mistakes.
return err
}

// sync cache
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modifying shared cache state without proper synchronization. The line reservation.Reservation = newReservation at line 203 modifies the reservation object that may be concurrently accessed by other goroutines. While the syncReservation method itself may be called from a single worker, the reservation object in the cache can be read by other methods (like MatchReservationForPod) without holding the cache lock. Consider using proper locking or copying mechanisms when updating cached reservation objects.

Suggested change
// sync cache
// sync cache
rc.Lock()
defer rc.Unlock()

Copilot uses AI. Check for mistakes.
Comment on lines +1240 to +1242
for jobTask, resvTask := range matched {
jobTask.ReservationTaskInfo = resvTask
}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side effect in validation function: The CheckReservationMatch function modifies the task state by setting jobTask.ReservationTaskInfo at line 1241. This is a side effect in what appears to be a validation/check function. This could lead to unexpected state modifications during validation and makes the code harder to reason about. Consider separating the validation logic from the state modification, or rename the function to better reflect that it performs both checking and binding.

Copilot uses AI. Check for mistakes.
job, task, err := rp.session.Cache().FindJobAndTask(taskInfo)
if err != nil {
return
}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rollback of task status update: In PreBindRollBack, the function calls SyncReservation but doesn't reverse the task status update that was performed in PreBind at line 169 (job.UpdateTaskStatus(task, api.Bound)). If PreBind succeeds but binding fails later, the task status remains as Bound even though the binding never completed. This could lead to inconsistent state. Consider reverting the task status in the rollback.

Suggested change
}
}
// Roll back task status that was set to Bound in PreBind.
if err := job.UpdateTaskStatus(task, api.Allocated); err != nil {
klog.Errorf("Failed to rollback status for reservation task %s, err: %v", task.Name, err)
return
}

Copilot uses AI. Check for mistakes.
Comment on lines +1152 to +1153
if owner.Object.Name == ownerRef.Name {
return true
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incomplete owner matching logic: The owner matching at lines 1152-1153 only compares the Name field from owner.Object with ownerRef.Name, but doesn't verify other important fields like Kind, Namespace, or UID. This could lead to false positive matches when different resources happen to have the same name. The comparison should include Kind and potentially UID to ensure accurate matching.

Suggested change
if owner.Object.Name == ownerRef.Name {
return true
// Match by full owner identity where possible to avoid false positives on name alone.
if owner.Object.Name == ownerRef.Name &&
owner.Object.Kind == ownerRef.Kind &&
owner.Object.APIVersion == ownerRef.APIVersion {
// If either UID is empty, fall back to name/kind/apiversion match for compatibility.
if string(owner.Object.UID) == "" || string(ownerRef.UID) == "" || owner.Object.UID == ownerRef.UID {
return true
}

Copilot uses AI. Check for mistakes.
Comment on lines +1578 to +1602
// Set some default value for matching the pod using reservation
if pod.Spec.ServiceAccountName == "" {
pod.Spec.ServiceAccountName = "default"
}

if pod.Spec.AutomountServiceAccountToken == nil {
trueVal := true
pod.Spec.AutomountServiceAccountToken = &trueVal
}

defaultTolerations := []v1.Toleration{
{
Key: "node.kubernetes.io/not-ready",
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoExecute,
TolerationSeconds: intPtr(300),
},
{
Key: "node.kubernetes.io/unreachable",
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoExecute,
TolerationSeconds: intPtr(300),
},
}
pod.Spec.Tolerations = mergeTolerations(pod.Spec.Tolerations, defaultTolerations)
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded default values may cause PodSpec matching issues: The function sets hardcoded defaults for ServiceAccountName (line 1580) and AutomountServiceAccountToken (line 1585), and adds default tolerations (lines 1588-1602). These defaults are applied to reservation pods but may not match the actual pod specifications that will use these reservations. If a real pod has different values (e.g., a different ServiceAccountName or doesn't have these default tolerations), the PodSpec matching in IsPodSpecMatch will fail. Consider making these defaults optional or ensuring they're applied consistently to both reservation and actual pods.

Copilot uses AI. Check for mistakes.
)

// replace volcano.sh/apis => ../volcano-apis
replace volcano.sh/apis => github.com/LegGasai/apis v0.0.0-20251228060403-bbce8ddfec23
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency replaced with personal GitHub fork: The volcano.sh/apis dependency has been replaced with a personal GitHub fork (github.com/LegGasai/apis). This is inappropriate for production code and should use the official volcano.sh/apis repository. Personal forks may not be maintained, could introduce security issues, and make it difficult for other contributors to build the project. Revert to using the official volcano.sh/apis dependency.

Suggested change
replace volcano.sh/apis => github.com/LegGasai/apis v0.0.0-20251228060403-bbce8ddfec23

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +193
/*
Copyright 2019 The Volcano Authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

package reservation

import (
"context"
"fmt"

"k8s.io/klog/v2"
"volcano.sh/apis/pkg/apis/scheduling/v1beta1"

"volcano.sh/volcano/pkg/scheduler/api"
"volcano.sh/volcano/pkg/scheduler/cache"
"volcano.sh/volcano/pkg/scheduler/framework"
)

const (
// PluginName indicates name of volcano scheduler plugin.
PluginName = "reservation"
)

type reservationPlugin struct {
// Arguments given for the plugin
session *framework.Session
}

// bind context extension information of reservation
type bindContextExtension struct {
SkipBind bool
}

// New function returns prioritizePlugin object
func New(arguments framework.Arguments) framework.Plugin {
return &reservationPlugin{}
}

func (rp *reservationPlugin) Name() string {
return PluginName
}

func (rp *reservationPlugin) OnSessionOpen(ssn *framework.Session) {
klog.V(5).Infof("Enter reservation plugin ...")

rp.session = ssn
defer func() {
klog.V(5).Infof("Leaving reservation plugin...")
}()
validJobFn := func(obj interface{}) *api.ValidateResult {
job, ok := obj.(*api.JobInfo)
if !ok {
return &api.ValidateResult{
Pass: false,
Message: fmt.Sprintf("Failed to convert <%v> to *JobInfo", obj),
}
}
// only check job which uses reservation
if !job.IsUseReservation() {
return nil
}

ssn.MatchReservationForPod(job)

if valid := ssn.CheckReservationAvailable(job); !valid {
return &api.ValidateResult{
Pass: false,
Reason: v1beta1.InvalidReservationReason,
Message: fmt.Sprintf("Reservation specified by job <%s/%s> is not Available", job.Namespace, job.Name),
}
}

if ownerMatched := ssn.CheckReservationOwners(job); !ownerMatched {
return &api.ValidateResult{
Pass: false,
Reason: v1beta1.ReservationOwnerNotMatchReason,
Message: fmt.Sprintf(
"Reservation specified by job <%s/%s> is not owned by the job (ownership mismatch by ownerObject or label selectors)",
job.Namespace, job.Name,
),
}
}

if specMatched := ssn.CheckReservationMatch(job); !specMatched {
return &api.ValidateResult{
Pass: false,
Reason: v1beta1.ReservationSpecNotMatchReason,
Message: fmt.Sprintf(
"Reservation specified by job <%s/%s> does not match job task spec: task count or PodSpec does not match",
job.Namespace, job.Name,
),
}
}
return nil
}

ssn.AddJobValidFn(rp.Name(), validJobFn)

bestNodeFn := func(task *api.TaskInfo, scores map[float64][]*api.NodeInfo) *api.NodeInfo {
if !task.IsReservationTask() {
return nil
}

reservationNodeNames := task.ReservationNodeNames
if len(reservationNodeNames) == 0 {
return nil
}

nodeSet := make(map[string]struct{})
for _, nodeList := range scores {
for _, node := range nodeList {
nodeSet[node.Name] = struct{}{}
}
}

// match reservation node names specified in given order with available nodes
for _, reserved := range reservationNodeNames {
if _, ok := nodeSet[reserved]; ok {
klog.V(4).Infof("Found reserved node %s for task %s/%s", reserved, task.Namespace, task.Name)
return ssn.Nodes[reserved]
}
}

return nil
}
ssn.AddBestNodeFn(rp.Name(), bestNodeFn)

ssn.RegisterBinder(rp.Name(), rp)
}

func (rp *reservationPlugin) PostBind(ctx context.Context, bindCtx *cache.BindContext) error {
task := bindCtx.TaskInfo

if !task.IsUseReservationTask() {
return nil
}

if err := rp.session.Cache().SyncBindToReservationTask(task); err != nil {
klog.Errorf("Failed to sync task %s to reservation task, err: %v", task.Name, err)
return err
}

return nil
}

func (rp *reservationPlugin) PreBind(ctx context.Context, bindCtx *cache.BindContext) error {
taskInfo := bindCtx.TaskInfo
if !taskInfo.IsReservationTask() {
return nil
}

job, task, err := rp.session.Cache().FindJobAndTask(taskInfo)
if err != nil {
return err
}
// reservation task need skipping bind action.
if err := job.UpdateTaskStatus(task, api.Bound); err != nil {
return err
}

bindCtx.SkipBind = true
rp.session.Cache().GetReservationCache().SyncReservation(task, job)
return nil
}

func (rp *reservationPlugin) PreBindRollBack(ctx context.Context, bindCtx *cache.BindContext) {
taskInfo := bindCtx.TaskInfo

if !taskInfo.IsReservationTask() {
return
}

job, task, err := rp.session.Cache().FindJobAndTask(taskInfo)
if err != nil {
return
}
rp.session.Cache().GetReservationCache().SyncReservation(task, job)
return
}

func (rp *reservationPlugin) OnSessionClose(ssn *framework.Session) {}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for reservation plugin: The reservation plugin in pkg/scheduler/plugins/reservation/reservation.go lacks test coverage. Other plugins in the scheduler have corresponding test files, but there's no reservation_test.go file. Given the complexity of the reservation logic (PreBind, PostBind, PreBindRollBack), test coverage is essential to ensure correctness and prevent regressions.

Copilot uses AI. Check for mistakes.
Comment on lines +279 to +285
defer rc.RUnlock()

for _, reservation := range rc.reservations {
if isReservationNeedExpiration(reservation, now) {
onExpired(reservation)
}
}
Copy link

Copilot AI Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential race condition: ScanExpiredReservations holds a read lock while invoking the callback onExpired, which may attempt to modify the reservation cache (e.g., by calling DeleteReservation or enqueueing items). If the callback attempts to acquire a write lock, this could lead to deadlock or other concurrency issues. Consider collecting expired reservations first and then processing them after releasing the lock, or document the constraints on the callback function.

Suggested change
defer rc.RUnlock()
for _, reservation := range rc.reservations {
if isReservationNeedExpiration(reservation, now) {
onExpired(reservation)
}
}
var expiredReservations []*schedulerapi.ReservationInfo
for _, reservation := range rc.reservations {
if isReservationNeedExpiration(reservation, now) {
expiredReservations = append(expiredReservations, reservation)
}
}
rc.RUnlock()
for _, reservation := range expiredReservations {
onExpired(reservation)
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/contains-merge-commits kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants