Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenLineage] Fix datasets in GCSTimeSpanFileTransformOperator #39064

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

kacpermuda
Copy link
Contributor

Currently we are including all files as datasets which can lead to increasing the size of the event and make matching datasets between jobs harder.

With that change, we are using prefixes from the user as dataset names and not full file paths. This way, user can easily control the size of the event and also ensure proper matching, when the same two prefixes are passed to different operators. I am also removing the list of files that was saved for the purpose of lineage datasets, introduced in #35838 .


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Signed-off-by: Kacper Muda <mudakacper@gmail.com>
@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Apr 16, 2024
@mobuchowski mobuchowski merged commit 0667083 into apache:main Apr 17, 2024
41 checks passed
@kacpermuda kacpermuda deleted the ol-fix-gcs-timespan branch April 18, 2024 06:17
utkarsharma2 pushed a commit to astronomer/airflow that referenced this pull request Apr 22, 2024
RodrigoGanancia pushed a commit to RodrigoGanancia/airflow that referenced this pull request May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants