-
Notifications
You must be signed in to change notification settings - Fork 16.4k
[v3-1-test] Fix: TriggerDagRunOperator stuck in deferred state with reset_dag_run (#57756) (#57968) #58333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…eset_dag_run (#57756) (#57968) When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True, reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently stuck in deferred state after clearing and re-running. Root cause: When reset_dag_run=True is used with a fixed run_id, the database preserves the original logical_date from the first run. However, on subsequent runs after clearing, the operator calculates a NEW logical_date based on the current time. The DagStateTrigger was being created with this newly calculated logical_date, causing a mismatch when querying the database - the trigger looked for a DAG run with the new logical_date but the database contained the original logical_date, causing the query to return zero results indefinitely. Solution: - Modified _handle_trigger_dag_run() in task_runner.py to pass execution_dates=None to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and globally unique - Added test test_handle_trigger_dag_run_deferred_with_reset_uses_run_id_only to verify the fix and prevent regression The fix ensures that both deferrable and non-deferrable modes use identical logic for determining DAG run completion - querying by run_id and state only, without filtering by logical_date which can become stale when resets are involved. (cherry picked from commit 4f3d0c5) Co-authored-by: Mykola Shyshov <mykola.shyshov@gmail.com>
ephraimbuddy
pushed a commit
that referenced
this pull request
Nov 18, 2025
…eset_dag_run (#57756) (#57968) (#58333) When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True, reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently stuck in deferred state after clearing and re-running. Root cause: When reset_dag_run=True is used with a fixed run_id, the database preserves the original logical_date from the first run. However, on subsequent runs after clearing, the operator calculates a NEW logical_date based on the current time. The DagStateTrigger was being created with this newly calculated logical_date, causing a mismatch when querying the database - the trigger looked for a DAG run with the new logical_date but the database contained the original logical_date, causing the query to return zero results indefinitely. Solution: - Modified _handle_trigger_dag_run() in task_runner.py to pass execution_dates=None to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and globally unique - Added test test_handle_trigger_dag_run_deferred_with_reset_uses_run_id_only to verify the fix and prevent regression The fix ensures that both deferrable and non-deferrable modes use identical logic for determining DAG run completion - querying by run_id and state only, without filtering by logical_date which can become stale when resets are involved. (cherry picked from commit 4f3d0c5) Co-authored-by: Mykola Shyshov <mykola.shyshov@gmail.com>
ephraimbuddy
pushed a commit
that referenced
this pull request
Nov 19, 2025
…eset_dag_run (#57756) (#57968) (#58333) When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True, reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently stuck in deferred state after clearing and re-running. Root cause: When reset_dag_run=True is used with a fixed run_id, the database preserves the original logical_date from the first run. However, on subsequent runs after clearing, the operator calculates a NEW logical_date based on the current time. The DagStateTrigger was being created with this newly calculated logical_date, causing a mismatch when querying the database - the trigger looked for a DAG run with the new logical_date but the database contained the original logical_date, causing the query to return zero results indefinitely. Solution: - Modified _handle_trigger_dag_run() in task_runner.py to pass execution_dates=None to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and globally unique - Added test test_handle_trigger_dag_run_deferred_with_reset_uses_run_id_only to verify the fix and prevent regression The fix ensures that both deferrable and non-deferrable modes use identical logic for determining DAG run completion - querying by run_id and state only, without filtering by logical_date which can become stale when resets are involved. (cherry picked from commit 4f3d0c5) Co-authored-by: Mykola Shyshov <mykola.shyshov@gmail.com>
ephraimbuddy
pushed a commit
that referenced
this pull request
Nov 19, 2025
…eset_dag_run (#57756) (#57968) (#58333) When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True, reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently stuck in deferred state after clearing and re-running. Root cause: When reset_dag_run=True is used with a fixed run_id, the database preserves the original logical_date from the first run. However, on subsequent runs after clearing, the operator calculates a NEW logical_date based on the current time. The DagStateTrigger was being created with this newly calculated logical_date, causing a mismatch when querying the database - the trigger looked for a DAG run with the new logical_date but the database contained the original logical_date, causing the query to return zero results indefinitely. Solution: - Modified _handle_trigger_dag_run() in task_runner.py to pass execution_dates=None to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and globally unique - Added test test_handle_trigger_dag_run_deferred_with_reset_uses_run_id_only to verify the fix and prevent regression The fix ensures that both deferrable and non-deferrable modes use identical logic for determining DAG run completion - querying by run_id and state only, without filtering by logical_date which can become stale when resets are involved. (cherry picked from commit 4f3d0c5) Co-authored-by: Mykola Shyshov <mykola.shyshov@gmail.com>
ephraimbuddy
pushed a commit
that referenced
this pull request
Nov 20, 2025
…eset_dag_run (#57756) (#57968) (#58333) When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True, reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently stuck in deferred state after clearing and re-running. Root cause: When reset_dag_run=True is used with a fixed run_id, the database preserves the original logical_date from the first run. However, on subsequent runs after clearing, the operator calculates a NEW logical_date based on the current time. The DagStateTrigger was being created with this newly calculated logical_date, causing a mismatch when querying the database - the trigger looked for a DAG run with the new logical_date but the database contained the original logical_date, causing the query to return zero results indefinitely. Solution: - Modified _handle_trigger_dag_run() in task_runner.py to pass execution_dates=None to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and globally unique - Added test test_handle_trigger_dag_run_deferred_with_reset_uses_run_id_only to verify the fix and prevent regression The fix ensures that both deferrable and non-deferrable modes use identical logic for determining DAG run completion - querying by run_id and state only, without filtering by logical_date which can become stale when resets are involved. (cherry picked from commit 4f3d0c5) Co-authored-by: Mykola Shyshov <mykola.shyshov@gmail.com>
ephraimbuddy
pushed a commit
that referenced
this pull request
Dec 3, 2025
…eset_dag_run (#57756) (#57968) (#58333) When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True, reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently stuck in deferred state after clearing and re-running. Root cause: When reset_dag_run=True is used with a fixed run_id, the database preserves the original logical_date from the first run. However, on subsequent runs after clearing, the operator calculates a NEW logical_date based on the current time. The DagStateTrigger was being created with this newly calculated logical_date, causing a mismatch when querying the database - the trigger looked for a DAG run with the new logical_date but the database contained the original logical_date, causing the query to return zero results indefinitely. Solution: - Modified _handle_trigger_dag_run() in task_runner.py to pass execution_dates=None to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and globally unique - Added test test_handle_trigger_dag_run_deferred_with_reset_uses_run_id_only to verify the fix and prevent regression The fix ensures that both deferrable and non-deferrable modes use identical logic for determining DAG run completion - querying by run_id and state only, without filtering by logical_date which can become stale when resets are involved. (cherry picked from commit 4f3d0c5) Co-authored-by: Mykola Shyshov <mykola.shyshov@gmail.com>
78 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When TriggerDagRunOperator is used with deferrable=True, wait_for_completion=True,
reset_dag_run=True, and a fixed trigger_run_id, the operator becomes permanently
stuck in deferred state after clearing and re-running.
Root cause:
When reset_dag_run=True is used with a fixed run_id, the database preserves the
original logical_date from the first run. However, on subsequent runs after clearing,
the operator calculates a NEW logical_date based on the current time. The DagStateTrigger
was being created with this newly calculated logical_date, causing a mismatch when
querying the database - the trigger looked for a DAG run with the new logical_date
but the database contained the original logical_date, causing the query to return
zero results indefinitely.
Solution:
to DagStateTrigger when run_ids is provided, since run_id alone is sufficient and
globally unique
verify the fix and prevent regression
The fix ensures that both deferrable and non-deferrable modes use identical logic
for determining DAG run completion - querying by run_id and state only, without
filtering by logical_date which can become stale when resets are involved.
(cherry picked from commit 4f3d0c5)
Co-authored-by: Mykola Shyshov mykola.shyshov@gmail.com