Skip to content

Implement HookLineageCollector for collection of Hook-generated datasets #38766

@mobuchowski

Description

@mobuchowski

Body

Implement HookLineageCollector that can receive AIP-60 compliant datasets from hooks AIP-62 implementation.

Conversion between AIP-60 and OpenLineage dataset naming, despite not being a part of this issue, needs to be considered: one of the solution might require accepting pairs of data in the forms of (Dataset/Hook) or (Dataset/Object Storage Implementation).

HookLineageCollector should expose collected datasets to listeners. This involves making datasets available to worker or listeners that have registered interest in them - whether by implementing some method or maybe some option.

Collection should be designed as a no-operation (no-op) if there are no listeners registered to use the data. Then, resources are not wasted on collecting and exposing datasets when there is no downstream consumption.

Committer

  • I acknowledge that I am a maintainer/committer of the Apache Airflow project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    AIP-62Tasks tracking implementation of AIP-62 Getting Lineage from Hook Instrumentationarea:lineage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions