KEP-5598: Extend opportunistic batching with rescoring by romanbaron · Pull Request #6039 · kubernetes/enhancements

romanbaron · 2026-04-29T08:45:06Z

One-line PR description:
Add rescoring to handle multi-pod-per-node workloads: when the last chosen node remains feasible, rescore it in-place and continue batching rather than flushing the cache.

Issue link: [Scheduling] OpportunisticBatching: redundant synchronous RunFilterPlugins call in batchStateCompatible for low-resource pods kubernetes#137707

Other comments:
AI tooling was used to assist in preparing this PR. All changes have been reviewed and verified by the author.

k8s-ci-robot · 2026-04-29T08:45:17Z

Hi @romanbaron. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

singh1203

Hey @romanbaron, thanks for including me. I went through the full diff carefully. Overall, the rescoring design is clean and well written. I just had one question, and thanks for answering it. 🙇
Overall, it looks good to me.

romanbaron · 2026-05-04T16:47:50Z

The PR is now rebased on top of the refactored KEP and only includes Rescore-related changes.

macsko · 2026-05-05T08:58:50Z

/ok-to-test

Co-authored-by: Toru Komatsu <k0ma@utam0k.jp>

macsko

The KEP looks good overall, thanks for doing this!

utam0k · 2026-05-08T21:32:08Z

I've read through it, and it looks good. This is a great update! Thanks!

sanposhiho · 2026-05-25T12:38:58Z

/assign

dom4ha

The approach with rescoring LGTM. I think we could optimize it further with caching cycleState as well, but I'd treat it as optional implementation detail, not as the core element of the solution, since it assumes all plugins implement it.

Leaving the final approval on @sanposhiho

dom4ha · 2026-06-16T13:03:43Z

+
+1. Node A is re-inserted into the cached sorted list. Since it is still feasible, it can host
+   additional pods.
+2. `Score` is called for each scoring plugin against node A using the current `CycleState`. The


I think we could cache the cycleState as well to avoid rerunning PreFilter. All we'd need to do is to run PreFilterExtension.AddPod on a pod assumed in the previous cycle.

I think not all plugins that have the PreFilter implement the PreFilterExtension.AddPod. AddPod is expected to be used in the preemption, so plugins that know the preemption doesn't matter for them, can skip defining AddPod/RemovePod. At some point we could try to cache the cycleState where possible, but I wouldn't consider it for now

Yeah, I already mentioned it as optional, so it's not a blocking comment for this KEP.

Anyway, we could make PreFilterExtension implementation either required in Signature or at least detect whether it's implemented before we can use it.

It should give the Opportunistic Batching a significant boost as PreFilter scales with a cluster size, but we treat it immutable anyway within opportunistic batching cache. So I think it's worth to consider it not in this KEP update, but in the future ones.

wojtek-t

The PRR part (unsurprisingly) still looks fine.

wojtek-t · 2026-06-16T14:00:11Z

-   rescoring.
+4. **Last chosen node feasibility check:** Filter plugins are run against the node chosen in the
+   previous cycle. Two outcomes are possible:
+   - **Infeasible:** The node is full (one-pod-per-node case). The node is discarded and batch


nit: it doesn't necessary mean the node is full or really be pod-per-node - it may e.g. be result of using node-port where scheduling more pods from our batch is impossible

sanposhiho

~~perf is the biggest concern - we already get a report from user, but this change will increase the overhead of this feature more, without mentioning any solution about the overhead problem~~

dom4ha · 2026-06-17T09:21:35Z

+
+1. Node A is re-inserted into the cached sorted list. Since it is still feasible, it can host
+   additional pods.
+2. `Score` is called for each scoring plugin against node A using the current `CycleState`. The


The way I read it is that we recompute PreFilter state as in each new cycle. I had a proposal to reuse (cache) the cycleState of the previous cycle in the comment #6039 (comment), but only as a future extension

sanposhiho · 2026-06-17T08:45:31Z

+
 ## Design Details

 ### Pod signature


we must make sure unsignable is returned when cross-node scoring scheduling is used on pods (preferred pod affinity, preferred topology spread, etc)

dom4ha · 2026-06-17T09:21:35Z

+
+1. Node A is re-inserted into the cached sorted list. Since it is still feasible, it can host
+   additional pods.
+2. `Score` is called for each scoring plugin against node A using the current `CycleState`. The


The way I read it is that we recompute PreFilter state as in each new cycle. I had a proposal to reuse (cache) the cycleState of the previous cycle in the comment #6039 (comment), but only as a future extension

sanposhiho · 2026-06-17T09:23:20Z

+
+1. Node A is re-inserted into the cached sorted list. Since it is still feasible, it can host
+   additional pods.
+2. `Score` is called for each scoring plugin against node A using the current `CycleState`. The


Score is called for each scoring plugin against node A using the current CycleState.

But, CycleState doesn't have any state calculated at PreScore at this point?

We discussed internally, and we agreed that we will run PreScore of all plugins before this. This is not the optimized way, but we can consider a further optimization in the next release cycle

/cc @dom4ha @macsko

PreFilter

PreScore?

A bit of more summery of our internal discussion as a record (not only about this prescore stuff)

Running PreScore of all plugins is not very cheap, and we definitely need to discuss how we can make it efficient in the next release cycle. Very likely, we need to reuse the cyclestate from the last pod somehow.

Also, another discussion was whether we want to rerun Score() for all cached nodes. The current KEP says we just run Score() for the node that the last pod selected. But, that means we don't support the support of cross-node scoring. The current opportunistic batching doesn't support the default scheduling config because the default scheduling config has a preferred topology spread, which is a cross-node scoring. So, actually supporting such cross-node scoring is important to make this feature more widely usable.

We have two options to discuss in the next release cycle-

If we drop the support of cross-node scoring for now, we can implement PreScoreExtension.AddPod to modify the cyclestate of the last pod. Then, the flow will be PreScoreExtension.AddPod(last pod) -> Score(selected node) -> NormalizeScore(all cached nodes)

If we want to support cross-node scoring, there are two options further
a. PreScoreExtension.AddPod(last pod) -> Score(all cached nodes) -> NormalizeScore(all cached nodes)
b. implement a new ext point Rescore(), which will do (a) essentially but more efficiently.

We don't decide it now, but will discuss in the next release KEP cycle

sanposhiho · 2026-06-17T11:03:59Z

/approve
/lgtm

@romanbaron is on vacation and cannot modify the KEP for this comment-

#6039 (comment)

Approving the PR because all the leads are on the same page.
We will ask him to edit it based on the agreement once he's back.

k8s-ci-robot · 2026-06-17T11:04:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: romanbaron, sanposhiho, singh1203

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~keps/sig-scheduling/OWNERS~~ [sanposhiho]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Apr 29, 2026

k8s-ci-robot requested a review from dom4ha April 29, 2026 08:45

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Apr 29, 2026

k8s-ci-robot requested a review from macsko April 29, 2026 08:45

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Apr 29, 2026

github-project-automation Bot added this to SIG Scheduling Apr 29, 2026

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 29, 2026

github-project-automation Bot moved this to Needs Triage in SIG Scheduling Apr 29, 2026

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 29, 2026

singh1203 approved these changes Apr 29, 2026

View reviewed changes

Comment thread keps/sig-scheduling/5598-opportunistic-batching/README.md

macsko reviewed Apr 30, 2026

View reviewed changes

Comment thread keps/sig-scheduling/5598-opportunistic-batching/README.md Outdated

romanbaron force-pushed the opportunistic-batching-rescore branch from afbce01 to 38281e1 Compare May 4, 2026 08:23

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 4, 2026

romanbaron force-pushed the opportunistic-batching-rescore branch from eda4b71 to 5734340 Compare May 4, 2026 13:59

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 4, 2026

romanbaron force-pushed the opportunistic-batching-rescore branch 2 times, most recently from e9c449d to 703870e Compare May 4, 2026 16:17

KEP-5598: Rescoring

8119e15

romanbaron force-pushed the opportunistic-batching-rescore branch from 703870e to 8119e15 Compare May 4, 2026 16:43

romanbaron mentioned this pull request May 4, 2026

REQUEST: New membership for romanbaron kubernetes/org#6347

Closed

11 tasks

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 5, 2026

romanbaron mentioned this pull request May 5, 2026

Added RawScores and refactored RunScorePlugins kubernetes/kubernetes#138788

Open

utam0k reviewed May 5, 2026

View reviewed changes

Comment thread keps/sig-scheduling/5598-opportunistic-batching/README.md

Comment thread keps/sig-scheduling/5598-opportunistic-batching/README.md

Comment thread keps/sig-scheduling/5598-opportunistic-batching/README.md Outdated

romanbaron and others added 2 commits May 6, 2026 10:59

Update keps/sig-scheduling/5598-opportunistic-batching/README.md

3b26e64

Co-authored-by: Toru Komatsu <k0ma@utam0k.jp>

Track scheduler memory in rollback signals and perf tests

f4cd3a0

romanbaron force-pushed the opportunistic-batching-rescore branch from 7d19290 to f4cd3a0 Compare May 6, 2026 12:15

This was referenced May 7, 2026

Moved SortedScoredNodes to framework package kubernetes/kubernetes#138848

Closed

Moved SortedScoredNodes to framework package kubernetes/kubernetes#138849

Open

macsko reviewed May 7, 2026

View reviewed changes

Comment thread keps/sig-scheduling/5598-opportunistic-batching/README.md Outdated

romanbaron added 2 commits May 10, 2026 11:27

Added a risk about scores being mixed from different cluster states

57f1bf6

Removed batching specific e2e tests from the KEP

7733dc2

k8s-ci-robot assigned sanposhiho May 25, 2026

pacoxu mentioned this pull request Jun 4, 2026

Opportunistic batching #5598

Open

18 tasks

helayoty moved this from Needs Triage to Needs Review in SIG Scheduling Jun 16, 2026

dom4ha reviewed Jun 16, 2026

View reviewed changes

wojtek-t reviewed Jun 16, 2026

View reviewed changes

sanposhiho reviewed Jun 17, 2026

View reviewed changes

dom4ha reviewed Jun 17, 2026

View reviewed changes

sanposhiho reviewed Jun 17, 2026

View reviewed changes

k8s-ci-robot requested review from dom4ha and macsko June 17, 2026 11:02

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 17, 2026

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2026

k8s-ci-robot merged commit e2f7ae8 into kubernetes:master Jun 17, 2026
4 checks passed

github-project-automation Bot moved this from Needs Review to Done in SIG Scheduling Jun 17, 2026

k8s-ci-robot added this to the v1.37 milestone Jun 17, 2026

This was referenced Jun 20, 2026

[Draft] Extend Opportunistic Batching with Rescoring kubernetes/website#56187

Closed

[Draft] Extend Opportunistic Batching with Rescoring kubernetes/website#56188

Draft

romanbaron mentioned this pull request Jun 22, 2026

KEP-5598: explained PreScore requirement in Rescoring flow #6212

Open

Uh oh!

Conversation

romanbaron commented Apr 29, 2026

Uh oh!

k8s-ci-robot commented Apr 29, 2026

Uh oh!

singh1203 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

romanbaron commented May 4, 2026

Uh oh!

macsko commented May 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

macsko left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

utam0k commented May 8, 2026

Uh oh!

sanposhiho commented May 25, 2026

Uh oh!

dom4ha left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanposhiho commented Jun 17, 2026

Uh oh!

k8s-ci-robot commented Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

sanposhiho left a comment •

edited

Loading

sanposhiho Jun 17, 2026 •

edited

Loading

sanposhiho Jun 17, 2026 •

edited

Loading