Skip to content

Metadata not working in Kubeflow Pipelines? #216

Closed
@rummens

Description

@rummens

Problem

I have deployed the taxi_simple example pipeline in KFP running on K8s outside of GCP. The artefacts are shared using a PV because of the current Beam limitation for S3.

I have noticed that the metadata features are not working. For example, every time I rerun the same pipeline (no code or data changes), every component is executed again and they are recomputing everything. I assumed that metadata would recognize, that the pipeline already ran and skip it (using the outputs of the "old" run).

Metadata is integrated into the Kubeflow ([884])(kubeflow/pipelines#884).

Ideas

I have the following ideas:

  1. I have to configure something that I am not aware of
  2. For some reason metadata only work when artefacts are stored within object storage.

Can someone explain me how metadata works on a high level? Who is checking if there was a previous run? I assume it is each component for itself or is there another "metadata" component that runs these checks?

Thank you very much for your support!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions