-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Fix set data_interval to logical_date when triggering DAG run #54119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
|
| data_interval = dag.timetable.infer_manual_data_interval(run_after=run_after) | ||
| data_interval = dag.timetable.infer_manual_data_interval(run_after=coerced_logical_date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not look right. The method takes a run_after, passing in something else is just wrong.
|
What problems does this cause? Can you be more specific? This might have been one of the purposefully breaking changes between v2 and v3, or it could be a bug. It's not clear at the moment. |
|
Thank you for your feedback! To clarify the issue: However, in Airflow 3, the data_interval of the triggered DAG run is instead set based on the time when the run is queued, not the intended logical date. As a result, the downstream DAG receives an unexpected data interval that does not match the desired scheduling semantics. I believe this is a bug rather than an intentional breaking change, since triggering workflows for specific time intervals is a core use case, especially when chaining DAGs for partitioned data processing. This PR is intended to restore the correct behavior for triggered DAG runs. I am attaching screenshots of the same DAG run’s details in both Airflow 2 and Airflow 3 for comparison. Please let me know if you’d like more detailed examples or further clarification! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use case seems fair.
This was changed in #46512, and seems deliberate because it was based on coerced_logical_date and changed to run_after, maybe we are missing something @sunank200
|
I believe this is intentional. The argument to the timetable method is Since you just want to have a date (the data interval starts and ends at the same datetime anyway in either case), why not just use the logical date value directly? It is the same across both major versions. |
|
I agree that writing DAGs to use logical_date, as you suggest, works, and I also understand your reasoning that run_after may be a more appropriate value in some cases. However, the official Airflow documentation recommends using data_interval_start rather than logical_date. More importantly, this change was introduced as a breaking change without prior notice, so many existing DAGs no longer work as expected. Given these factors, I would appreciate it if you could reconsider this behavior—perhaps by maintaining the previous approach or providing a clear way for users to choose the desired logic. Thank you! |
|
Not deciding whether it's a valid change - I marked it to 3.0.5 so that we do not forget about it when we prepare it. |
|
There are conflicts to resolve |
|
@eladkal |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a path forward would be to update the documentation instead, and keep the run_after param ?
In Airflow 2, the data_interval was set to logical_date (execution_date) when triggering a DAG run. In Airflow 3, the current data_interval is set to run_after, which causes issues when using data_interval_end. This commit addresses this problem.
|
@pierrejeambrun @potiuk @uranusjr @ashb |
|
@uranusjr @pierrejeambrun What's the next step on this one? (Moving the milestone for now) |
pierrejeambrun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As TP mentioned, I think this is expected and run_after is more reasonable.
I would be for updating the documentation to reflect that, maybe add significant release note entry for 3.0.0 to mention that this has changed?
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nothingmin do you want to follow up and convert this PR into a documentation one, and explain that this change was intented? I think it would be a great addition and help clear confusion for others too.
|
@pierrejeambrun Sure, I'll follow up and convert this PR into a documentation one to clarify this intentional behavior change. I'll update it soon. |


In Airflow 2, the data_interval was set to logical_date (execution_date) when triggering a DAG run.
In Airflow 3, the current data_interval is set to run_after, which causes issues when using data_interval_end.
This commit addresses this problem.
For reference, the code in Airflow 2 looked like this:
