Description
Problem
I have deployed the taxi_simple example pipeline in KFP running on K8s outside of GCP. The artefacts are shared using a PV because of the current Beam limitation for S3.
I have noticed that the metadata features are not working. For example, every time I rerun the same pipeline (no code or data changes), every component is executed again and they are recomputing everything. I assumed that metadata would recognize, that the pipeline already ran and skip it (using the outputs of the "old" run).
Metadata is integrated into the Kubeflow ([884])(kubeflow/pipelines#884).
Ideas
I have the following ideas:
- I have to configure something that I am not aware of
- For some reason metadata only work when artefacts are stored within object storage.
Can someone explain me how metadata works on a high level? Who is checking if there was a previous run? I assume it is each component for itself or is there another "metadata" component that runs these checks?
Thank you very much for your support!