-
Notifications
You must be signed in to change notification settings - Fork 14.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(datasets): persist dataset extra dict to dataset event #36075
fix(datasets): persist dataset extra dict to dataset event #36075
Conversation
ee6d441
to
056893c
Compare
Do we need to do the merge at all? From what I can tell, the cc @blag for some history on this |
@uranusjr I also thought the implementation was intended this way. I removed the merge with the latest commit to this PR for the review. In case the discussion comes to the result that a merge of the dictionaries is required we can quickly roll back. |
5e6009f
to
a607a31
Compare
Please refer to the PR implementing this, and the discussion about an IIRC, the intent here was not to break database normalization and copy Given that, I would not expect IMO #35297 is invalid, because tasks should be able to query the Airflow DB directly for the dataset extra field using the keys in |
The docs are not optimally worded and definitely conflate datasets and dataset events, possibly leading to this confusion. See here.
Do datasets trigger DAGs, or do dataset events trigger DAGs? The title of this section and its text seem to disagree. The answer is: dataset events trigger DAGs. A one word change would vastly improve the precision of this section:
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
Persists Dataset extras to the according DatasetEvent to be used in Tasks invoked by Data aware scheduling. Please see linked issue for more information
closes: #35297