Managed and scheduled multiple single pods with podgroup #2358

zbbkeepgoing · 2022-07-18T09:17:15Z

For solve this situation

When we use spark client model to submit spark job in k8s with volcano scheduler. hope all executors can be managed and scheduled through podgroup.
hope some single pods can be managed and scheduled through podgroup with volcano.

zbbkeepgoing · 2022-07-18T09:18:16Z

Link to the api pull request: volcano-sh/apis#83. I will update go.mod after pr of api is merged

Thor-wl

Hey. Kindly to ask some questions about the background:

Why do you want to separate the whole spark job to different and independent pods instead of making use of spark job or volcano job?
I've not had a try about that. But as what I understand, Volcano has supported to generate PodGroup for separate pods. Is there something wrong with your test?

zbbkeepgoing · 2022-08-01T02:00:44Z

Hey. Kindly to ask some questions about the background:

Why do you want to separate the whole spark job to different and independent pods instead of making use of spark job or volcano job?

I've not had a try about that. But as what I understand, Volcano has supported to generate PodGroup for separate pods. Is there something wrong with your test?

1、In Spark's Client submit mode. all the executor pod is independent, if we want to manage them by volcano, it need to group them in a ng. And a large part of the current spark users are also submitted using the client mode.
2、Yes, volcano support generate pg for separate pods, but not support pg's minresource not support, and this pr will support it. In addition, the generated podgroup only refer the first pod, if the first pod terminated, pg will be delete auto.

So the pr is support minresource for independent pods, and make pg refer to all the independent pods.

Thor-wl · 2022-08-16T07:33:50Z

pkg/webhooks/admission/pods/validate/admit_pod.go

@@ -145,6 +146,15 @@ func validatePod(pod *v1.Pod, reviewResponse *admissionv1.AdmissionResponse) str
 	return msg
 }

+func isVcJob(pod *v1.Pod) bool {


Suggest to rename to BelongToVcJob

zbbkeepgoing · 2022-08-19T03:02:59Z

/retest

volcano-sh-bot · 2022-08-19T03:03:15Z

@zbbkeepgoing: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

hwdef · 2022-08-19T03:20:51Z

pkg/webhooks/admission/pods/validate/admit_pod.go

@@ -145,6 +146,15 @@ func validatePod(pod *v1.Pod, reviewResponse *admissionv1.AdmissionResponse) str
 	return msg
 }

+func BelongToVcJob(pod *v1.Pod) bool {


why make this public?

I moved to utils.go, may it be used elsewhere in the future?

hwdef · 2022-08-19T06:16:03Z

please compress multiple commits into a single commit.

zbbkeepgoing · 2022-08-20T12:40:39Z

please compress multiple commits into a single commit.

done

zbbkeepgoing · 2022-08-20T13:48:48Z

/retest

volcano-sh-bot · 2022-08-20T13:49:04Z

@zbbkeepgoing: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

zbbkeepgoing · 2023-04-03T09:10:28Z

/assign @wangyang0616

wangyang0616 · 2023-04-03T11:32:55Z

docs/user-guide/how_to_schedule_multipe_single_pods_with_podgroup.md

+But it can not manage and schedule multiple single pods in one pg now. And this missing ability is very useful in spark client mode and so on.
+It will also play a role in some scenarios that require multiple single pods to work together (vj cannot be used).
+
+## Key Points


In the current PR implementation, users are allowed to add MinResource resource information through annotation, but podgroup has many attributes, such as: MinMember, Queue, etc. The PR does not support the setting of these information. Can you add in-range and out-of-range in the document? Instructions are easier for users to understand.

I would add some limitation to the current

wangyang0616 · 2023-04-03T11:39:59Z

In the scenario you describe, I agree to enhance the podgroup’s ability to manage a batch of independent pods, but I still have some concerns. MinResources can be configured through annotations. Will other attributes of the podgroup support this method in the future? Same as @Yikun considered, if they are all introduced, then does the crd of podgroup still need to exist?

cc @william-wang @jinzhejz

zbbkeepgoing · 2023-04-04T03:17:21Z

In the scenario you describe, I agree to enhance the podgroup’s ability to manage a batch of independent pods, but I still have some concerns. MinResources can be configured through annotations. Will other attributes of the podgroup support this method in the future? Same as @Yikun considered, if they are all introduced, then does the crd of podgroup still need to exist?

cc @william-wang @jinzhejz

For the former, I think there are suitable scenarios that we can add as needed, but for the latter, I hold different views. Because pg needs to exist as a group that manages pods, it is not just a simple CRD, it is the link between the scheduling units and the scheduling logic in the Volcano scheduler. I think this enhancement is just an increase in the PodGroup creation entry.

For example, VJ is currently the most native management entrance for PodGroups, and it may cover some batch scenarios. "spark.kubernetes.scheduler.volcano.podGroupTemplateFile" is the entry point for Spark to manage PodGroup natively, and it covers some big data scenarios. The current enhancement is to manage the PodGroup's entrance through the PodGroup's Controller, which covers some special scenarios.

Their existence has nothing to do with the PodGroup's CRD in nature, just different ways to creating and managing PodGroups. PodGroup still play a vital role in the scheduler. If we don't use PodGroup to play this link between Controller and Scheduler, just use Annotation to complete it. That is a challenge for the code complexity, scalability, and rationality of the Scheduler layer.

Signed-off-by: Binbin Zou <binbin.zou@kyligence.io> fix e2e Signed-off-by: Binbin Zou <binbin.zou@kyligence.io> Update UT & Update podgroup's min resource Signed-off-by: Binbin Zou <binbin.zou@kyligence.io> Add user-guide doc & slove concurrent security Signed-off-by: Binbin Zou <binbin.zou@kyligence.io> fix doc Signed-off-by: Binbin Zou <binbin.zou@kyligence.io> Add Limitations Signed-off-by: Binbin Zou <binbin.zou@kyligence.io>

stale · 2023-06-10T01:20:48Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

hwdef · 2023-06-11T14:41:05Z

still needs to be reviewed

lowang-bh · 2023-07-14T14:42:57Z

In my opition, we can create a vocano podgroup manully to handle those 3 pods. Then those pods will be contolled via the podgroup.

lowang-bh · 2023-06-15T07:51:48Z

docs/user-guide/how_to_schedule_multipe_single_pods_with_podgroup.md

+      image: busybox:1.28
+      command: ['sh', '-c', 'echo "Hello, busybox3!" && sleep 3600']
+```
+2. Kubectl describe pg.


In my opition, you can create a vocano podgroup manully to handle those 3 pods. Then those pods will be contolled via the podgroup.

stale · 2023-10-15T09:14:43Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

hwdef · 2023-12-15T08:53:32Z

/reopen

volcano-sh-bot · 2023-12-15T08:53:36Z

@hwdef: Reopened this PR.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

volcano-sh-bot · 2023-12-15T08:53:44Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign william-wang
You can assign the PR to them by writing /assign @william-wang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

lowang-bh · 2023-12-16T06:56:51Z

/hold

stale · 2024-03-17T09:30:37Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

volcano-sh-bot · 2024-04-09T02:04:12Z

@zbbkeepgoing: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

stale · 2025-04-26T00:44:50Z

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

volcano-sh-bot requested review from huone1 and hzxuzhonghu July 18, 2022 09:17

volcano-sh-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 18, 2022

zbbkeepgoing mentioned this pull request Jul 18, 2022

Support managed and scheduled single pods with podgroup, specify podgroup name and minresource manually #2359

Closed

Thor-wl requested review from Thor-wl, shinytang6, hwdef, william-wang and wpeng102 and removed request for hzxuzhonghu and huone1 July 19, 2022 01:16

Thor-wl reviewed Aug 1, 2022

View reviewed changes

Thor-wl reviewed Aug 16, 2022

View reviewed changes

hwdef reviewed Aug 19, 2022

View reviewed changes

zbbkeepgoing force-pushed the pg branch from f8306ad to 6878910 Compare August 20, 2022 12:39

zbbkeepgoing requested review from Thor-wl and hwdef and removed request for shinytang6, william-wang, wpeng102, Thor-wl and hwdef August 22, 2022 05:49

volcano-sh-bot assigned wangyang0616 Apr 3, 2023

wangyang0616 reviewed Apr 3, 2023

View reviewed changes

zbbkeepgoing force-pushed the pg branch from f8955fd to 64efe5d Compare April 4, 2023 03:36

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2023

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 11, 2023

lowang-bh reviewed Aug 8, 2023

View reviewed changes

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 15, 2023

stale bot closed this Dec 15, 2023

volcano-sh-bot reopened this Dec 15, 2023

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 15, 2023

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 17, 2024

volcano-sh-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 9, 2024

stale bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 9, 2024

stale bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 26, 2025

stale bot closed this May 6, 2025

Managed and scheduled multiple single pods with podgroup #2358

Managed and scheduled multiple single pods with podgroup #2358

Uh oh!

Conversation

zbbkeepgoing commented Jul 18, 2022

Uh oh!

zbbkeepgoing commented Jul 18, 2022

Uh oh!

Thor-wl left a comment

Choose a reason for hiding this comment

Uh oh!

zbbkeepgoing commented Aug 1, 2022

Uh oh!

Thor-wl Aug 16, 2022

Choose a reason for hiding this comment

Uh oh!

zbbkeepgoing Aug 19, 2022

Choose a reason for hiding this comment

Uh oh!

zbbkeepgoing commented Aug 19, 2022

Uh oh!

volcano-sh-bot commented Aug 19, 2022

Uh oh!

hwdef Aug 19, 2022

Choose a reason for hiding this comment

Uh oh!

zbbkeepgoing Aug 20, 2022

Choose a reason for hiding this comment

Uh oh!

hwdef commented Aug 19, 2022

Uh oh!

zbbkeepgoing commented Aug 20, 2022

Uh oh!

zbbkeepgoing commented Aug 20, 2022

Uh oh!

volcano-sh-bot commented Aug 20, 2022

Uh oh!

zbbkeepgoing commented Apr 3, 2023

Uh oh!

wangyang0616 Apr 3, 2023

Choose a reason for hiding this comment

Uh oh!

zbbkeepgoing Apr 4, 2023

Choose a reason for hiding this comment

Uh oh!

wangyang0616 commented Apr 3, 2023

Uh oh!

zbbkeepgoing commented Apr 4, 2023

Uh oh!

stale bot commented Jun 10, 2023

Uh oh!

hwdef commented Jun 11, 2023

Uh oh!

lowang-bh commented Jul 14, 2023

Uh oh!

lowang-bh Jun 15, 2023

Choose a reason for hiding this comment

Uh oh!

stale bot commented Oct 15, 2023

Uh oh!

hwdef commented Dec 15, 2023

Uh oh!

volcano-sh-bot commented Dec 15, 2023

Uh oh!

volcano-sh-bot commented Dec 15, 2023

Uh oh!

lowang-bh commented Dec 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stale bot commented Mar 17, 2024

Uh oh!

volcano-sh-bot commented Apr 9, 2024

Uh oh!

stale bot commented Apr 26, 2025

Uh oh!

Uh oh!

lowang-bh commented Dec 16, 2023 •

edited

Loading