Replies: 3 comments 7 replies
-
As far as I look at it, manual and "scheduled" runs are fundamentally different. They even have different custom time-table branches definition (when custom time-tables are defined). I am not sure what @uranusjr and @ashb @malthe think about it, but while the current semantics is not 100% accurate, there is simply no sematics that is and it is "good enough". And maybe that is a sign we should actually split them totally and make it obvious in the interface. The question about mixing "scheduled" and "manual" runs has been raised a few times in the past and maybe we could improve it ien the way to be less confusing. But unless there is some concrete proposal how this could be improved - this is mostly academic discussion, I am afraid. |
Beta Was this translation helpful? Give feedback.
-
My perspective is that the problem is with how data_interval is set to just set to the most recent interval. The behavior I really want is the ability to kick off a dag run manually and specify the start and end date. It seems like a fairly common use case with intervals to say run daily, but then maybe need to go back and rerun a 7 day period. That could be addressed with the backfill feature, but isn't as flexible like if you're delivering quarterly but then need re-deliver a one day period |
Beta Was this translation helpful? Give feedback.
-
bash test.py --start {{data_interval_start }} so my solution is --date {{ (data_interval_start + macros.timedelta(hours=8)).strftime("%Y-%m-%d") }} But the inconsistency between “backfill” and “daily scheduling” still not resolved,why it's so complicated... |
Beta Was this translation helpful? Give feedback.
-
Apache Airflow version
2.2.3
What happened
When triggering a DAG manually (via the web or via
airflow dags trigger
), some template params likeds
,ts
, and others derived fromdag_run.logical_date
will be set to the specified execution timestamp. This is inconsistent with automated runs where those fields are set todata_interval_start
. This behavior contradicts the documentation in a few places, and can cause tasks that depend on those template params to behave unintuitively.What you expected to happen
I expected
ds
to always equaldata_interval_start
. Quoting the docs in a few different places (emphasis mine):DAG Runs: Data Interval
FAQ: What does
execution_date
mean?However, it's worth noting that DAGs: Running DAGs does seem to explain this edge case:
How to reproduce
Example DAG:
Trigger this dag via web or via
airflow dags trigger test_dag -e <some timestamp>
, then look at output in the logs.Example output for an automated run:
Example output for a manually-triggered run:
Operating System
CentOS 7.4
Versions of Apache Airflow Providers
Only the defaults.
Deployment
Other
Deployment details
Just running processes locally.
Anything else
I'm not convinced that this is just a documentation issue; the fact that
logical_date
and all derived fields can have contextually different meanings seems fundamentally broken to me. To keep my users from running into issues, I feel like I am forced to teach them either "never useds
/ts
/etc." or "never trigger DAGs manually", neither of which feels great.As far as I can tell, there is no way to manually trigger a dag and have it behave exactly like a "normal" automated run since
ds
will always fall outside of the data interval. Which begs the question: What does it even mean to manually trigger a DAG Run when data intervals are involved? It shouldn't be able to affect the existing schedule, so the current behavior of "snapping" to the latest complete data interval makes sense to me. But for consistency, I think alldag_run
fields (except for things likerun_id
) should follow that same behavior.Alternatively, maybe there are two classes of DAGs: Ones that operate on data intervals, and ones that operate on a single instant in time (e.g.
schedule_interval=None
). And perhaps the former should never be manually triggered and should only ever use something likeairflow dags backfill
to run specific intervals. And ideally the web and CLI would reflect this to prevent running a DAG "the wrong way".Admittedly I am new to Airflow, so maybe my intuitions are not correct. And I recognize that there are almost certainly some users that depend on the current behavior, so it would definitely be a pain to change. But I'm curious to hear if other people have thoughts about this or specific examples of why the current behavior is desirable.
Are you willing to submit PR?
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions