-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Enable OpenLineage Breeze integration via Marquez #33742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst)
|
8e25fb0 to
818063b
Compare
|
Cool! Suggestiom though: Should we add it to Breeze's cheathseet including ability to chnage is as env variables? https://github.com/apache/airflow/blob/main/dev/breeze/src/airflow_breeze/utils/visuals.py#L89 |
e289381 to
ce74532
Compare
|
Easy enough; added options to configure via I did not add it to the cheatsheet as the other optional metrics integrations aren't in that cheatsheet and I hesitate to add all of the integrations there as it will increase the cheetsheet length (and possibly add confusion since these aren't core Airflow services) significantly. If you disagree, let me know and I'll go ahead and add. |
|
Needs conflict resoiving due to changes commands in main.
We can print error and exit for 11 and 15 if you try to use openlineage integration. We already do similar checks for mysql/ARM combination. Look for:
That would be surprising. The one difference I can think of - naybe this is connected with the way openlineage provider (and any other) is used in "main" - the openlineage code is loaded directly from sources not from package. So some code might not work the same way (for example queryng package metada will not work if you run it from main (INSTALL_PROVIDERS_FROM_SOURCES="true" is set in breeze and ProvidersManager will look in sources for provider.yaml files rather than querying provider_info endpoint to get all the provider information). @mobuchowski - maybe that rings a bell?
It's fine. We do it for others tool |
|
BTW. Even if we do not print it in cheatsheet. It might be worthwile to log information that marquez is available at this and that addreess. This can be done in the same par of "enter_shell" pre-execution code that you can use to do Postgres verification. Would be nice to extract those checks (existing ARM/mysql and new Openlineage/postgres to separate methods). Also I am wondering what we should do if we run openlineage but NOT postgres (say mysql). I would not really want to stop this from happening. I guess we should error out similarly as with Postgres 11, 15 and print information to use Postgres 12-14. |
@patgarz It's probably missing some word like "minimum" or "at least" 12.1 - can you link where you found this, to fix it?
After #32692 loading plugin from provider should happen regardless if providers are installed from sources or from package - provided the plugins and providers aren't lazily loaded. |
@mobuchowski I was referencing the requirements here: https://github.com/MarquezProject/marquez#requirements PG15 produces the following warning in Marquez startup, and the app then exits (though we should likely migrate this conversation over there): WARN [2023-08-28 15:55:03,189] org.flywaydb.core.internal.database.base.Database: Flyway upgrade recommended: PostgreSQL 15.4 is newer than this version of Flyway and support has not been tested. The latest supported version of PostgreSQL is 14. |
|
@patgarz sure, created issue on Marquez GH: MarquezProject/marquez#2604 |
ce74532 to
e99c184
Compare
potiuk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pending regeneration of breeze images.
e99c184 to
ba46a15
Compare
|
Need to solve conflicts still :( |
ba46a15 to
fc2ec3d
Compare
|
Rebased & should be all set now. |
|
Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions. |
|
Cool :) |
This adds an option to spin up
breezeusing--integration openlineage, which will spin up a simple implementation of Marquez and send metrics to namespaceairflow.Notes before merge -
2.7.0. It did not work in themain. I'm not sure whether that indicates a bug in my configuration or a bug that has been introduced in themain.3000, so the standard mapping pattern of2<port>does not work. I opted for23100so that this PR did not break--integration all, but if there's another preferred pattern, it's an easy fix.^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.