Skip to content

Conversation

@kacpermuda
Copy link
Contributor

@kacpermuda kacpermuda commented Nov 7, 2025

TLDR;

Follow-up to #57620 - this PR extends the common.compat provider to support the new Hook Lineage add_extra method, ensuring full backward compatibility and consistent hook-level lineage behavior across all Airflow 2.10+ versions.

Problem

The add_extra method for hook lineage collection was introduced in Airflow 3.2 (#57620). However, providers currently support Airflow 2.10+ and may call the new add_extra method when running on Airflow versions earlier than 3.2. We previously encountered a similar issue during the transition from Dataset to Asset, where the hook lineage collector methods changed from add_input_dataset to add_input_asset. To handle that, we added a compatibility layer in the common.compat provider that manages these naming differences. All other providers were then updated to import the hook lineage collector from common.compat instead of core Airflow, call was changed to the new _asset methods and the newer compat provider was made a dependency for those providers.

Without such a compatibility layer, hooks using incompatible code (for example, calling a method that does not exist) would fail on older Airflow versions. This would cause inconsistent behavior and prevent provider hooks from emitting rich lineage metadata. Therefore, I propose applying the same solution here, extending the current compatibility approach.

Solution

This PR enhances the existing compatibility layer in the common.compat provider to ensure that the hook lineage collector returned from it consistently supports the add_extra method across all supported Airflow versions. For Airflow versions that do not natively include add_extra, the compatibility layer provides a full implementation of the method, maintaining consistent lineage quality. This is feasible because the hook lineage collector is already operational and collecting lineage data in Airflow 2.10+, so only a small extension was needed to add this capability.

This PR extends current compatibility layer in the common.compat provider to ensure that hook lineage collector returned by common.compat provider is equipped with add_extra method and behaves consistently across all supported Airflow versions. It implements the full add_extra functionality for Airflow versions that don't have it natively to keep consistency in lineage quality. (It's possible because the hook lineage collector is already up and running and is already gathering lineage as the whole structure in airflow core is already there for AF2.10+, so it was just about extending it a bit.)

With this update, provider hooks can safely use:

from airflow.providers.common.compat.lineage.hook import get_hook_lineage_collector

get_hook_lineage_collector().add_extra(context, key, value)

on any Airflow 2.10+ version, and the behavior will remain identical - provided they depend on the latest common.compat provider version (which should be updated when introducing this call, as per usual practice).

Comprehensive test coverage has been added, covering both standard and edge cases across all Airflow versions included in compatibility testing, ensuring the functionality performs as intended.

This solution allows hook implementations to call add_extra without version-specific logic, delivering a consistent lineage experience for users regardless of the Airflow version in use.

Example Usage

from airflow.providers.common.compat.lineage.hook import get_hook_lineage_collector

collector = get_hook_lineage_collector()

# These already worked on all AF 2.10+ versions with previous compat implementation, even when `asset` was still named `dataset`
collector.add_input_asset(context, uri="s3://bucket/input")
collector.add_output_asset(context, uri="s3://bucket/output")

# This will also work on all AF 2.10+ versions now
collector.add_extra(context, "query", "SELECT * FROM table")
collector.add_extra(context, "query_id", "1234")

# And for hook lineage readers:
lineage = collector.collected_assets

# These already worked on all AF 2.10+ versions with previous compat implementation
lineage.inputs - list of AssetLineageInfo
lineage.outputs - list of AssetLineageInfo

# This will also work on all AF 2.10+ versions now
lineage.extra - list of ExtraLineageInfo (now available on all versions!)

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@kacpermuda kacpermuda force-pushed the feat-compat-extend-hll branch 3 times, most recently from 7d7c23a to 2d9d30f Compare November 14, 2025 13:07
@kacpermuda kacpermuda marked this pull request as ready for review November 14, 2025 13:07
@kacpermuda kacpermuda changed the title feat: Adjust HookLevelLineage with add_extra feat: Adjust common.compat HookLevelLineage collector for new add_extra method Nov 14, 2025
Copy link
Contributor

@mobuchowski mobuchowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not beautiful... but can't think of better way of making this work on older versions.

@kacpermuda
Copy link
Contributor Author

It's definitely not beautiful... but can't think of better way of making this work on older versions.

Could not agree more, and I think it can bring a lot of value for older Airflow versions.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the polyfill approach is one we should use - my first thought is to instead make it a no op only older versions

@kacpermuda
Copy link
Contributor Author

I considered that approach as well, but ultimately decided to go with the polyfill implementation. The code isn’t very complex in the end, and since it’s only executed once for a singleton collector instance, the overhead is minimal. I understand that maintaining it could be more challenging compared to a no-op, and that ideally, users should upgrade Airflow to benefit from the latest features. However, in my view, the value this solution provides outweighs the effort required to implement and maintain it.

That said, if this is a hard no or a blocker for you, I’m happy to adjust the code and switch to a no-op. But if you think there’s room to proceed with the polyfill approach, I’d be glad to make any changes needed to align it better.

@kacpermuda
Copy link
Contributor Author

@ashb How do you feel about the above? Can we proceed with the polyfill approach?

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Waiting for @ashb

@ashb
Copy link
Member

ashb commented Nov 24, 2025

Looking now.

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay with polyfill, but we need to make it duck-typed, not version based (see comment)

@kacpermuda kacpermuda force-pushed the feat-compat-extend-hll branch from 2d9d30f to bd3987f Compare November 24, 2025 17:03
Extends the current common.compat compatibility layer with support for the new Hook Lineage `add_extra` method, ensuring full backward compatibility and consistent hook-level lineage behavior across all Airflow 2.11+ versions.
@kacpermuda kacpermuda force-pushed the feat-compat-extend-hll branch from bd3987f to a178d16 Compare November 24, 2025 18:10
@ashb ashb merged commit 3f30adf into apache:main Nov 26, 2025
118 checks passed
@kacpermuda kacpermuda deleted the feat-compat-extend-hll branch November 26, 2025 14:07
Copilot AI pushed a commit to jason810496/airflow that referenced this pull request Dec 5, 2025
…#58057)

Extends the current common.compat compatibility layer with support for the new Hook Lineage `add_extra` method, ensuring full backward compatibility and consistent hook-level lineage behavior across all Airflow 2.11+ versions.
itayweb pushed a commit to itayweb/airflow that referenced this pull request Dec 6, 2025
…#58057)

Extends the current common.compat compatibility layer with support for the new Hook Lineage `add_extra` method, ensuring full backward compatibility and consistent hook-level lineage behavior across all Airflow 2.11+ versions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants