Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KEP] Maximum Execution Time #3133

Merged
merged 8 commits into from
Oct 23, 2024
Merged

Conversation

trasc
Copy link
Contributor

@trasc trasc commented Sep 25, 2024

What type of PR is this?

/kind documentation

What this PR does / why we need it:

Proposed KEP fo Maximum Execution Time feature.

Which issue(s) this PR fixes:

Relates to #3125

Special notes for your reviewer:

Does this PR introduce a user-facing change?

none

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/documentation Categorizes issue or PR as related to documentation. labels Sep 25, 2024
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 25, 2024
Copy link

netlify bot commented Sep 25, 2024

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 3fb7435
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/67189d6abc2ded0008dcd6ed

@trasc
Copy link
Contributor Author

trasc commented Sep 25, 2024

/cc @tenzen-y

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very initial pass to discuss the main points.

keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
@mimowo
Copy link
Contributor

mimowo commented Oct 1, 2024

/assign @mwielgus

Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few questions I have.

  1. Do we want some kind of declarative API that users who don't use kueuectl can still set this?
    a) Is it okay to set the label on the Workload objects and this would be respected?

  2. I am a bit unfamilar with workloads in Kueue. Will setting the active field to false cause a suspend or a deletion of all workloads?

  3. What would be the suggestions for existing activeDeadlines with this feature? If a JobSet user wants an active deadline outside of Kueue, what should we implement? Same for Kubeflow, Ray, etc.

@trasc
Copy link
Contributor Author

trasc commented Oct 3, 2024

  1. Do we want some kind of declarative API that users who don't use kueuectl can still set this?
    a) Is it okay to set the label on the Workload objects and this would be respected?

Yes, a label kueue.x-k8s.io/max-exec-time-seconds, regardless if the job type, the value will be passed into the created Workload object.

  1. I am a bit unfamilar with workloads in Kueue. Will setting the active field to false cause a suspend or a deletion of all workloads?

The job is suspended.

  1. What would be the suggestions for existing activeDeadlines with this feature? If a JobSet user wants an active deadline outside of Kueue, what should we implement? Same for Kubeflow, Ray, etc.

The proposal is only relevant while working with Kueue.

Other activeDeadlines (that maybe marks the job a Failed) could be used but Kueue will not be aware of them.

@kannon92
Copy link
Contributor

kannon92 commented Oct 3, 2024

Yes, a label kueue.x-k8s.io/max-exec-time-seconds, regardless if the job type, the value will be passed into the created Workload object.

I read that a user needs to set kueuectl -t and that would set a label in the workload. But if I do not use kueuectl at all, I would like to know can I add a label to a job and this would be propagated to the workload object?

I read that this is a bit of a hack to get a standard for representing workload time. I would like to see a future where we can add some kind of general API that works with Kueue or without Kueue to enforce a deadline for any workload.

At the moment, I worry that users would not know to use activeDeadlineSeconds (at Job or Pod) or as a person working on JobSet, I do not know what kind of API to implement in JobSet to set up a future with Kueue.

Plus we have a few feature requests for the Job API to have a smarter deadline ie kubernetes/kubernetes#111948.

It seems like you are basically saying that if Kueue label is specfiied, then we should not be using any of the workload timelimits as they will race/conflict with each other.

@trasc
Copy link
Contributor Author

trasc commented Oct 3, 2024

Yes, a label kueue.x-k8s.io/max-exec-time-seconds, regardless if the job type, the value will be passed into the created Workload object.

I read that a user needs to set kueuectl -t and that would set a label in the workload. But if I do not use kueuectl at all, I would like to know can I add a label to a job and this would be propagated to the workload object?

I think you are referring to kjobctl -t which is one intended use-case for this, however if the label is set on a job Kueue will perform the same regardless of the tool used to create the job.

I read that this is a bit of a hack to get a standard for representing workload time. I would like to see a future where we can add some kind of general API that works with Kueue or without Kueue to enforce a deadline for any workload.

At the moment, I worry that users would not know to use activeDeadlineSeconds (at Job or Pod) or as a person working on JobSet, I do not know what kind of API to implement in JobSet to set up a future with Kueue.

Nothing should be done for the specific jobs (JobSet, RayJob ...) the only changes needed are in Kueue and mostly in the generic implementation (jobframework).

Plus we have a few feature requests for the Job API to have a smarter deadline ie kubernetes/kubernetes#111948.

It seems like you are basically saying that if Kueue label is specfiied, then we should not be using any of the workload timelimits as they will race/conflict with each other.

Unfortunately for specific job types this could happen, however, as follow-ups, Kueue can try to detect conflicting configuration at the job specific webhooks level and alert the user when necessary.

@tenzen-y
Copy link
Member

I will come back here after the next release.

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2024
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 173883caae4f20496e46c6c5ad12eee4d9cd09df

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 22, 2024
@alculquicondor
Copy link
Contributor

@tenzen-y @mimowo can this still be considered for v0.9?

@tenzen-y
Copy link
Member

@tenzen-y @mimowo can this still be considered for v0.9?

ProvReqRequeue KEP and TAS basic implementation has been merged.
So, if @trasc or someone can lead this implementation, I'm happy to review this feature.

@tenzen-y
Copy link
Member

/cc

@trasc
Copy link
Contributor Author

trasc commented Oct 22, 2024

I have a draft implementation, and only need to realign it with the KEP.

keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
```
- If `remainingTime` > 0 , a reconcile request should be queued with a `remainingTime` delay.
- If `remainingTime` <= 0 , the workload should be deactivated `wl.spec.Active = *false`, the `wl.status,AccumulatedPastExecutionTimeSeconds` set to 0, and a relevant event recorded.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the Workload is preempted / evicted even if the remainingTime is greater than 0?
Will we reset the .status.accumulatedPastExecutionTimeSeconds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time spent as admitted in the current cycle is added to accumulatedPastExecutionTimeSeconds. L120

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So, when we re-admit the Workload after preemption, we will add the .spec. maximumExecutionTimeSeconds to the .status. accumulatedPastExecutionTimeSeconds , right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are accumulating the time the workload was in admitted state , regardless of how many admit/evict cycles the wl has if the sum gets to maximumExecutionTimeSeconds we are deactivating.

(The accumulated time resets on deactivation).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. In that case, specifying both Job activeDeadline and this maximumExecutionTimeSeconds could cooperate since we can specify the activeDeadline per oneshot, but we can specify the maximumExecutionTime based on cluster-level Job running limitation time.

keps/3125-maximum-execution-time/README.md Outdated Show resolved Hide resolved
keps/3125-maximum-execution-time/README.md Show resolved Hide resolved
keps/3125-maximum-execution-time/kep.yaml Outdated Show resolved Hide resolved
@tenzen-y
Copy link
Member

I have a draft implementation, and only need to realign it with the KEP.

Thank you for leading this feature.

@kannon92
Copy link
Contributor

I still have concerns on the impact of this with existing deadline features with other workloads.

It seems that Kueue will have its own deadline syntax for all workloads. Does this mean that workloads using their own API should be discouraged from using those APIs while using Kueue?

If a pod has an active deadline will we do any validation to make sure that the Kueue timeout and this are meaningful together or would we ignore the pod one?

Same for Job with ActiveDeadlines.

And yes, any future workload can implement their own deadline but I think Kueue integration with these would be important so I am still not sure the path forward for JobSet or training operator.

@mimowo
Copy link
Contributor

mimowo commented Oct 23, 2024

As mentioned in the issue we would like to have slurm-like -t option, supported across all Jobs to use for time-based scheduling reliably. We also want to be able to set the new timeout from kjobctl for any Job CRD.

Actually, take for example the ActiveDeadlineSeconds timeout from core k8s - we have it two different levels: Pod and Job, because the use-cases are different. I think it is also the case here, that the use-case of Kueue is different, and the semantics of the timeout differ.

@mimowo
Copy link
Contributor

mimowo commented Oct 23, 2024

@tenzen-y @mimowo can this still be considered for v0.9?

It is still on the list of nice-to-haves, and I'm happy to move it forward, especially if @alculquicondor and @tenzen-y can carry the reviews as I'm currently focused primarly on completing TAS.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 23, 2024
@tenzen-y
Copy link
Member

@tenzen-y @mimowo can this still be considered for v0.9?

It is still on the list of nice-to-haves, and I'm happy to move it forward, especially if @alculquicondor and @tenzen-y can carry the reviews as I'm currently focused primarly on completing TAS.

Sure, I can lead the review for this.

@tenzen-y
Copy link
Member

If a pod has an active deadline will we do any validation to make sure that the Kueue timeout and this are meaningful together or would we ignore the pod one?

Indeed, I asked similar questions at #3133 (comment).
But, we do not verify the relationship between Pod or Job activedeadlineSeconds and Kueue activeDeadlinSeconds since as @mimowo mentioned above, those have different use cases.

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 23, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 776213cfe52cb0d7a5901cd6ba599a27a8a6b455

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, tenzen-y, trasc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [alculquicondor,tenzen-y]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mimowo
Copy link
Contributor

mimowo commented Oct 23, 2024

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 23, 2024
@k8s-ci-robot k8s-ci-robot merged commit ffc4382 into kubernetes-sigs:main Oct 23, 2024
7 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.9 milestone Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/documentation Categorizes issue or PR as related to documentation. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants