-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FeatureRequest]KFP execution cache #2904
Comments
/assign @rui5i |
not sure which is the first phase target, from SDK-to-API, right? |
Sorry for the confusing. Update the issue content. We change the project to "KFP execution cache" and retry from a certain step will become a underlying CUJ. There will not be any new APIs. The design doc will be sent out really soon. |
Hello @rui5i, That's a really nice feature! Thanks! |
Hi @elikatsis , Thanks for your feedback! The design doc currently is under KFP team review. I am happy to present on the upcoming KFP community meeting! |
Ping! |
Also interested in any docs on this, or updates on the release. I saw some meeting notes here: https://docs.google.com/document/d/1KB5KD8TvcrnxQX0xluHRnUdRYkM2-5-vB2V1wlxS4GY/edit#heading=h.gt0qfhljl8xo but its not clear to me if a decision has been made |
Hi, thanks for asking! The link you provided is our caching design doc. We are trying to make it available on 0.3.1. I'll let you know after it's release. |
Awesome. Feel free to ask if there is anything can help from community. |
Hi again, I wanted to ask, are there any plans on what will you be showing as step's logs in KFP UI? Or the cached steps will appear with empty logs? |
Hi @elikatsis , Thanks for checking in! Currently, if a step's result is taken from cache, then the step log will show "This step output is taken from cache." https://github.com/kubeflow/pipelines/blob/master/backend/src/cache/server/mutation.go#L119. In the future we may explore if it's possible to show the link of previous run/step. |
Kubeflow Pipelines step caching is now released in 0.4.0 and after. Close this issue. |
/reopen TODO:
I will finish the UI integration |
@Bobgy: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Looks like we did not add any extra labels, so the labels/annotations of reused pods are same. However, given a pod, it's pretty easy to detect that it was skipped (it does not have any Argo containers):
There is also a way to detect the skipped pods based on the WorkflowStatus alone: All output artifacts have the pod name in the URI. But for skipped pods, the pod name does not match the URIs. This genius idea belongs to @rui5i. (And now there are always some output artifacts since we've enabled log archiving). |
@Ark-kun I'm a little worried about if we are relying too much on argo details:
The last two points probably is also dangerous to cache server... It really sounds to me we should contribute the caching solution to argo workflow natively and add it to workflow status. (just personal gut feeling, I guess that's not practical now) In the mean time, I'll let UI use the hack first to first get it working. |
Digging through some related argo issues, I understand argo isn't really taking the feature request. |
@Ark-kun Can I assume we only cache successful steps? |
I'm positive about that. Half a year ago I even started the project to add caching to Argo, but did not finish it. The caching requires some persistence and Argo did not have any DB at that time.
We only reuse the successfull steps. In the future we'll start reusing still-running steps. |
The text was updated successfully, but these errors were encountered: