Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update pg status with applyStatus #3865

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jxs1211
Copy link

@jxs1211 jxs1211 commented Dec 9, 2024

Improve the performance of update pg status with applyStatus api.
#3852

@volcano-sh-bot
Copy link
Contributor

Welcome @jxs1211!

It looks like this is your first PR to volcano-sh/volcano.

Thank you, and welcome to Volcano. 😃

@volcano-sh-bot volcano-sh-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Dec 9, 2024
@jxs1211
Copy link
Author

jxs1211 commented Dec 9, 2024

/assign @hwdef

@@ -300,7 +301,22 @@ func (su *defaultStatusUpdater) UpdatePodGroup(pg *schedulingapi.PodGroup) (*sch
return nil, err
}

updated, err := su.vcclient.SchedulingV1beta1().PodGroups(podgroup.Namespace).Update(context.TODO(), podgroup, metav1.UpdateOptions{})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this modification is correct. I remember that this function will also update the annotation and label, so it is wrong to only update the status.

Copy link
Author

@jxs1211 jxs1211 Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may rewrite the logic , not to persist these nums fields in podgroups, like this pr: #3751

I think the proposal is want to not update the status fields in podgroups, do you mean the annotation and labels still need to be updated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there may be some other fields in spec or status need to be updated. We can't only just update the condition. What I want to do is not to refresh running/failed/succeeded fields in every session, because if session interval is short, there may be a lot of podgroup update requests, which may burden kube-apiserver.

@JesseStutler
Copy link
Member

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

@jxs1211
Copy link
Author

jxs1211 commented Dec 12, 2024

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

@JesseStutler
Copy link
Member

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

@JesseStutler
Copy link
Member

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

I think we can implement it in vcctl first, later we can decide whether we should implement a kubectl plugin.

@jxs1211
Copy link
Author

jxs1211 commented Dec 18, 2024

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

I think we can implement it in vcctl first, later we can decide whether we should implement a kubectl plugin.

I found the doc related for this volcano/docs/design/podgroup-statistics.md, and the feature mentioned in the doc has been implemented, correct me if I'm wrong.

@jxs1211 jxs1211 force-pushed the perf/apply-pg-status branch from ba44179 to 24f673d Compare December 18, 2024 08:33
@volcano-sh-bot volcano-sh-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Dec 18, 2024
@jxs1211 jxs1211 force-pushed the perf/apply-pg-status branch from 24f673d to 72dce44 Compare December 19, 2024 07:47
@volcano-sh-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign hwdef
You can assign the PR to them by writing /assign @hwdef in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jxs1211 jxs1211 force-pushed the perf/apply-pg-status branch from 72dce44 to f26b621 Compare December 19, 2024 07:53
@volcano-sh-bot volcano-sh-bot added retest-not-required-docs-only size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Dec 19, 2024
@JesseStutler
Copy link
Member

Could you help to research that are there some other tools that can help kubectl calculate the number of running/failed/succeeded pods in the podgroup and then display these fields, but not by directly reading these fields in the status for display? Or if this cannot be achieved, we can use vcctl to display them, but I think more users might like to use kubectl.

Yes, do you have any suggestion on a specific tool or method to achieve that, I don't have any idea for that by now.

You may refer to:

Use kubectl plugin may help us still show the column running/failed/succeeded, but we don't need to persist these fields in vc-controller, but calculate dynamically by counting running/failed/succeeded pods in kubectl plugin.

I think we can implement it in vcctl first, later we can decide whether we should implement a kubectl plugin.

I found the doc related for this volcano/docs/design/podgroup-statistics.md, and the feature mentioned in the doc has been implemented, correct me if I'm wrong.

No, running/pending/unknown....fields in queue status are not persisted now, but there are still fields in podgroups Running/Failed/Succeeded that need to be refreshed every session, I think they does not need to be persisted also, you can check here:

status.Running = int32(len(jobInfo.TaskStatusIndex[api.Running]))
status.Failed = int32(len(jobInfo.TaskStatusIndex[api.Failed]))
status.Succeeded = int32(len(jobInfo.TaskStatusIndex[api.Succeeded]))

Signed-off-by: Jay Shane <327411586@qq.com>
@jxs1211
Copy link
Author

jxs1211 commented Dec 24, 2024

volcano/pkg/scheduler/framework/session.go

I removed it.

@JesseStutler
Copy link
Member

volcano/pkg/scheduler/framework/session.go

I removed it.

It's not enough, there may some users need to use vcctl to check these fields, we need to query the pods under podgroups, similar to this PR: #3751 then calculate the number of pods for display.

Also we need to check carefully whether if these fields is depended by some plugins or actions

@jxs1211
Copy link
Author

jxs1211 commented Dec 24, 2024

volcano/pkg/scheduler/framework/session.go

I removed it.

It's not enough, there may some users need to use vcctl to check these fields, we need to query the pods under podgroups, similar to this PR: #3751 then calculate the number of pods for display.

Also we need to check carefully whether if these fields is depended by some plugins or actions

Do you mean we need to do the same thing at the removed code like PR: #3751 did, don't we try to offload the apiserver's pressure in session, do I misunderstand? Another things I want to make sure is we're discussing removed the field(Running/Failed/Succeeded) in PodGroupStatus or just remove any reference of them(like the code was removed)?

@Monokaix
Copy link
Member

/hold

@volcano-sh-bot volcano-sh-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 24, 2024
@JesseStutler
Copy link
Member

volcano/pkg/scheduler/framework/session.go

I removed it.

It's not enough, there may some users need to use vcctl to check these fields, we need to query the pods under podgroups, similar to this PR: #3751 then calculate the number of pods for display.
Also we need to check carefully whether if these fields is depended by some plugins or actions

Do you mean we need to do the same thing at the removed code like PR: #3751 did, don't we try to offload the apiserver's pressure in session, do I misunderstand? Another things I want to make sure is we're discussing removed the field(Running/Failed/Succeeded) in PodGroupStatus or just remove any reference of them(like the code was removed)?

Yes, we need to do the same thing as #3751, remove the three fields of PodGroupStatus. If these three fields are dependent on other code, we need to use other methods to transform the dependent code (for example, get the Running/Succeeded/Failed pods under the podgroup)

@JesseStutler
Copy link
Member

So the right things are:

  1. Remove the code that updates the three fields (we keep the API unchanged, just remove the code that updates these three fields is fine)
  2. Adapt the code that depends on these three fields (including whether the code in the scheduler/controller/e2e test depends on these three fields). If the logic written depends on these three fields in PodGroupStatus, we need to adapt them.

@JesseStutler
Copy link
Member

/area performance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Issues or PRs related to performance do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants