update pg status with applyStatus #3865

jxs1211 · 2024-12-09T13:05:47Z

Improve the performance of update pg status with applyStatus api.
#3852

volcano-sh-bot · 2024-12-09T13:05:50Z

Welcome @jxs1211!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. 😃

jxs1211 · 2024-12-09T13:09:07Z

hwdef · 2024-12-09T13:15:09Z

pkg/scheduler/cache/cache.go

@@ -300,7 +301,22 @@ func (su *defaultStatusUpdater) UpdatePodGroup(pg *schedulingapi.PodGroup) (*sch
 		return nil, err
 	}

-	updated, err := su.vcclient.SchedulingV1beta1().PodGroups(podgroup.Namespace).Update(context.TODO(), podgroup, metav1.UpdateOptions{})


I'm not sure if this modification is correct. I remember that this function will also update the annotation and label, so it is wrong to only update the status.

we may rewrite the logic , not to persist these nums fields in podgroups, like this pr: #3751

I think the proposal is want to not update the status fields in podgroups, do you mean the annotation and labels still need to be updated?

Yes, there may be some other fields in spec or status need to be updated. We can't only just update the condition. What I want to do is not to refresh running/failed/succeeded fields in every session, because if session interval is short, there may be a lot of podgroup update requests, which may burden kube-apiserver.

JesseStutler · 2024-12-11T15:11:56Z

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

jxs1211 · 2024-12-12T12:59:14Z

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

JesseStutler · 2024-12-17T02:21:45Z

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

JesseStutler · 2024-12-17T12:14:41Z

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/

https://medium.com/@platform.engineers/developing-and-using-kubectl-plugins-a-hands-on-guide-1ee7aa4aea20

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

I think we can implement it in vcctl first, later we can decide whether we should implement a kubectl plugin.

jxs1211 · 2024-12-18T06:54:39Z

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/

https://medium.com/@platform.engineers/developing-and-using-kubectl-plugins-a-hands-on-guide-1ee7aa4aea20

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

I think we can implement it in vcctl first, later we can decide whether we should implement a kubectl plugin.

I found the doc related for this volcano/docs/design/podgroup-statistics.md, and the feature mentioned in the doc has been implemented, correct me if I'm wrong.

volcano-sh-bot · 2024-12-19T07:47:28Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign hwdef
You can assign the PR to them by writing /assign @hwdef in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

JesseStutler · 2024-12-24T03:58:38Z

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/

https://medium.com/@platform.engineers/developing-and-using-kubectl-plugins-a-hands-on-guide-1ee7aa4aea20

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

I think we can implement it in vcctl first, later we can decide whether we should implement a kubectl plugin.

I found the doc related for this volcano/docs/design/podgroup-statistics.md, and the feature mentioned in the doc has been implemented, correct me if I'm wrong.

No, running/pending/unknown....fields in queue status are not persisted now, but there are still fields in podgroups Running/Failed/Succeeded that need to be refreshed every session, I think they does not need to be persisted also, you can check here:

volcano/pkg/scheduler/framework/session.go

Lines 353 to 355 in 0966fd5

    
           status.Running = int32(len(jobInfo.TaskStatusIndex[api.Running])) 
        
           status.Failed = int32(len(jobInfo.TaskStatusIndex[api.Failed])) 
        
           status.Succeeded = int32(len(jobInfo.TaskStatusIndex[api.Succeeded]))

Signed-off-by: Jay Shane <327411586@qq.com>

jxs1211 · 2024-12-24T07:11:22Z

volcano/pkg/scheduler/framework/session.go

I removed it.

JesseStutler · 2024-12-24T07:22:17Z

volcano/pkg/scheduler/framework/session.go

I removed it.

It's not enough, there may some users need to use vcctl to check these fields, we need to query the pods under podgroups, similar to this PR: #3751 then calculate the number of pods for display.

Also we need to check carefully whether if these fields is depended by some plugins or actions

jxs1211 · 2024-12-24T07:35:07Z

volcano/pkg/scheduler/framework/session.go

I removed it.

It's not enough, there may some users need to use vcctl to check these fields, we need to query the pods under podgroups, similar to this PR: #3751 then calculate the number of pods for display.

Also we need to check carefully whether if these fields is depended by some plugins or actions

Do you mean we need to do the same thing at the removed code like PR: #3751 did, don't we try to offload the apiserver's pressure in session, do I misunderstand? Another things I want to make sure is we're discussing removed the field(Running/Failed/Succeeded) in PodGroupStatus or just remove any reference of them(like the code was removed)?

Monokaix · 2024-12-24T08:19:35Z

/hold

JesseStutler · 2025-01-02T15:49:07Z

volcano/pkg/scheduler/framework/session.go

I removed it.

It's not enough, there may some users need to use vcctl to check these fields, we need to query the pods under podgroups, similar to this PR: #3751 then calculate the number of pods for display.
Also we need to check carefully whether if these fields is depended by some plugins or actions

Do you mean we need to do the same thing at the removed code like PR: #3751 did, don't we try to offload the apiserver's pressure in session, do I misunderstand? Another things I want to make sure is we're discussing removed the field(Running/Failed/Succeeded) in PodGroupStatus or just remove any reference of them(like the code was removed)?

Yes, we need to do the same thing as #3751, remove the three fields of PodGroupStatus. If these three fields are dependent on other code, we need to use other methods to transform the dependent code (for example, get the Running/Succeeded/Failed pods under the podgroup)

JesseStutler · 2025-01-02T15:53:03Z

So the right things are:

Remove the code that updates the three fields (we keep the API unchanged, just remove the code that updates these three fields is fine)
Adapt the code that depends on these three fields (including whether the code in the scheduler/controller/e2e test depends on these three fields). If the logic written depends on these three fields in PodGroupStatus, we need to adapt them.

JesseStutler · 2025-01-02T15:53:10Z

/area performance

volcano-sh-bot requested review from k82cn and merryzhou December 9, 2024 13:05

volcano-sh-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 9, 2024

volcano-sh-bot assigned hwdef Dec 9, 2024

hwdef reviewed Dec 9, 2024

View reviewed changes

jxs1211 force-pushed the perf/apply-pg-status branch from ba44179 to 24f673d Compare December 18, 2024 08:33

volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 18, 2024

jxs1211 force-pushed the perf/apply-pg-status branch from 24f673d to 72dce44 Compare December 19, 2024 07:47

jxs1211 force-pushed the perf/apply-pg-status branch from 72dce44 to f26b621 Compare December 19, 2024 07:53

volcano-sh-bot added retest-not-required-docs-only size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 19, 2024

update pg status with applyStatus

71e05ad

Signed-off-by: Jay Shane <327411586@qq.com>

jxs1211 force-pushed the perf/apply-pg-status branch from f26b621 to 71e05ad Compare December 24, 2024 07:05

volcano-sh-bot removed the retest-not-required-docs-only label Dec 24, 2024

volcano-sh-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 24, 2024

volcano-sh-bot added the area/performance Issues or PRs related to performance label Jan 2, 2025

JesseStutler mentioned this pull request Jan 2, 2025

[Enhancement]Optimize volcano end-to-end scheduling large-scale pod performance #3852

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update pg status with applyStatus #3865

update pg status with applyStatus #3865

jxs1211 commented Dec 9, 2024

volcano-sh-bot commented Dec 9, 2024

jxs1211 commented Dec 9, 2024

hwdef Dec 9, 2024

jxs1211 Dec 9, 2024 •

edited

Loading

JesseStutler Dec 11, 2024

JesseStutler commented Dec 11, 2024

jxs1211 commented Dec 12, 2024

JesseStutler commented Dec 17, 2024

JesseStutler commented Dec 17, 2024

jxs1211 commented Dec 18, 2024 •

edited

Loading

volcano-sh-bot commented Dec 19, 2024

JesseStutler commented Dec 24, 2024

jxs1211 commented Dec 24, 2024 •

edited

Loading

JesseStutler commented Dec 24, 2024

jxs1211 commented Dec 24, 2024 •

edited

Loading

Monokaix commented Dec 24, 2024

JesseStutler commented Jan 2, 2025

JesseStutler commented Jan 2, 2025

JesseStutler commented Jan 2, 2025

update pg status with applyStatus #3865

Are you sure you want to change the base?

update pg status with applyStatus #3865

Conversation

jxs1211 commented Dec 9, 2024

volcano-sh-bot commented Dec 9, 2024

jxs1211 commented Dec 9, 2024

hwdef Dec 9, 2024

Choose a reason for hiding this comment

jxs1211 Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

JesseStutler Dec 11, 2024

Choose a reason for hiding this comment

JesseStutler commented Dec 11, 2024

jxs1211 commented Dec 12, 2024

JesseStutler commented Dec 17, 2024

JesseStutler commented Dec 17, 2024

jxs1211 commented Dec 18, 2024 • edited Loading

volcano-sh-bot commented Dec 19, 2024

JesseStutler commented Dec 24, 2024

jxs1211 commented Dec 24, 2024 • edited Loading

JesseStutler commented Dec 24, 2024

jxs1211 commented Dec 24, 2024 • edited Loading

Monokaix commented Dec 24, 2024

JesseStutler commented Jan 2, 2025

JesseStutler commented Jan 2, 2025

JesseStutler commented Jan 2, 2025

jxs1211 Dec 9, 2024 •

edited

Loading

jxs1211 commented Dec 18, 2024 •

edited

Loading

jxs1211 commented Dec 24, 2024 •

edited

Loading

jxs1211 commented Dec 24, 2024 •

edited

Loading