Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Ability to disable caching for a particular pipeline run via the UI #6578

Open
jackwhelpton opened this issue Sep 16, 2021 · 11 comments

Comments

@jackwhelpton
Copy link
Contributor

Feature Area

/area frontend

What feature would you like to see?

The ability to enable or disable (v2) caching for a pipeline run via the UI.

What is the use case or pain point?

When developing pipelines, it is possible to end up in a position where the component "succeeds" (does not error), but returns erroneous results. Once this has happened, correcting the component code and re-executing does not suffice, as the previous (incorrect) results are cached.

Is there a workaround currently?

We separate authoring pipelines from executing them, so typically our executions are via the Vertex Pipelines UI, by uploading the compiled JSON.

One workaround would be to extend the pipeline code to support executing the pipeline from code, which does allow the caching to be disabled. This change would have to be made to all our pipelines, and is onerous.

Either move the input files and change the components to bust the cache, or execute the pipeline using Python.

Neither of these approaches are ideal, and neither would be easily available to non-coders when they execute the pipeline (people who can upload a given JSON file and set parameters, but not make code changes).

Love this idea? Give it a 👍. We prioritize fulfilling features with the most 👍.

@capri-xiyue
Copy link
Contributor

@zijianjoy
I talked with sdk side and vertex side, the agreement is to implement disable/enable caching level at client side first.
Later when we have a clear goal of caching CUJ, we can talk about implementing this in server side which means change pipeline job proto.
SDK already implemented such logic in

def _set_enable_caching_value(pipeline_spec: Dict[str, Any],
enable_caching: bool) -> None:
"""Sets pipeline tasks caching options.
Args:
pipeline_spec: The dictionary of pipeline spec.
enable_caching: Whether to enable caching.
"""
for component in [pipeline_spec['root']] + list(
pipeline_spec['components'].values()):
if 'dag' in component:
for task in component['dag']['tasks'].values():
task['cachingOptions'] = {'enableCache': enable_caching}
, you can follow such logic in UI.

@zijianjoy
Copy link
Collaborator

zijianjoy commented Oct 11, 2021

@capri-xiyue Sounds good on providing the ability to disable pipeline level caching on UI side, once we updated pipeline job proto to support this field. But question: currently when you create a run from pipeline template, UI doesn't use the PipelineJob payload itself, instead UI will create a run using pipeline_version_id for PIPELINE_VERSION resource_reference. So I guess it should be backend which exposes caching configuration first?

@zijianjoy zijianjoy assigned capri-xiyue and unassigned zijianjoy Oct 11, 2021
@capri-xiyue
Copy link
Contributor

@capri-xiyue Sounds good on providing the ability to disable pipeline level caching on UI side, once we updated pipeline job proto to support this field. But question: currently when you create a run from pipeline template, UI doesn't use the PipelineJob payload itself, instead UI will create a run using pipeline_version_id for PIPELINE_VERSION resource_reference. So I guess it should be backend which exposes caching configuration first?

Can UI manipulate the template before UI call backend?

@capri-xiyue
Copy link
Contributor

Discussed offline, backend needs to expose cache configuration first and then front end can disable caching for a particular pipeline run via the UI

@capri-xiyue capri-xiyue assigned james-jwu and unassigned capri-xiyue Dec 24, 2021
@capri-xiyue
Copy link
Contributor

Reassigned it to James Wu to further discuss the priority and the assignee

@stale
Copy link

stale bot commented Apr 17, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Apr 17, 2022
@juliusvonkohout
Copy link
Member

We will cover this and more in #8177 hopefully we can present it tomorrow in the KFP meeting

@stale stale bot removed the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Sep 13, 2022
@thesuperzapper
Copy link
Member

@zijianjoy it would be great to disable caching for a specific run via the UI, what do you think?

Many users don't understand what the cache does, and having that option in the UI would help them understand that sometimes their runs will pull from a cache.

Note, we already have a feature to do this when submitting runs from the SDK:

https://www.kubeflow.org/docs/components/pipelines/v2/caching/#how-to-use-caching

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Apr 22, 2024

There are also some older issues and PRs for exactly that. For example #8177

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the lifecycle/stale The issue / pull request is stale, any activities remove this label. label Jun 22, 2024
@juliusvonkohout
Copy link
Member

/lifecycle frozen

@google-oss-prow google-oss-prow bot added lifecycle/frozen and removed lifecycle/stale The issue / pull request is stale, any activities remove this label. labels Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

7 participants