Skip to content

Conversation

@dheerajturaga
Copy link
Member

@dheerajturaga dheerajturaga commented Aug 18, 2025

Tasks queued longer than task_queued_timeout were not being terminated by the Airflow scheduler, leading to accumulation of stale queued tasks over several days.

task_queue_timeout_bad

Root Cause:
The queued_by_job_id field was not being reset to None, which prevented the scheduler from recognizing that the task was eligible for termination due to timeout.

Fix:
Reset queued_by_job_id when rescheduling stuck tasks to ensure timeout monitoring works correctly across multiple DAG runs.

task_queue_timeout_fixed

Reset queued_by_job_id when rescheduling stuck tasks to ensure timeout
monitoring works correctly across multiple DAG runs.
@boring-cyborg boring-cyborg bot added the area:Scheduler including HA (high availability) scheduler label Aug 18, 2025
@ashb ashb added the backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch label Aug 18, 2025
@ashb ashb merged commit bec069c into apache:main Aug 18, 2025
59 checks passed
github-actions bot pushed a commit that referenced this pull request Aug 18, 2025
…54594)

Reset queued_by_job_id when rescheduling stuck tasks to ensure timeout
monitoring works correctly across multiple DAG runs.
(cherry picked from commit bec069c)

Co-authored-by: Dheeraj Turaga <dheerajturaga@gmail.com>
@github-actions
Copy link

Backport successfully created: v3-0-test

Status Branch Result
v3-0-test PR Link

github-actions bot pushed a commit to aws-mwaa/upstream-to-airflow that referenced this pull request Aug 18, 2025
…pache#54594)

Reset queued_by_job_id when rescheduling stuck tasks to ensure timeout
monitoring works correctly across multiple DAG runs.
(cherry picked from commit bec069c)

Co-authored-by: Dheeraj Turaga <dheerajturaga@gmail.com>
@kaxil kaxil added this to the Airflow 3.0.6 milestone Aug 18, 2025
@dheerajturaga dheerajturaga deleted the fix-task-queued-timeout-bug branch August 18, 2025 13:11
kaxil pushed a commit that referenced this pull request Aug 18, 2025
…54594)

Reset queued_by_job_id when rescheduling stuck tasks to ensure timeout
monitoring works correctly across multiple DAG runs.
(cherry picked from commit bec069c)

Co-authored-by: Dheeraj Turaga <dheerajturaga@gmail.com>
kaxil pushed a commit that referenced this pull request Aug 18, 2025
…54594) (#54604)

Reset queued_by_job_id when rescheduling stuck tasks to ensure timeout
monitoring works correctly across multiple DAG runs.
(cherry picked from commit bec069c)

Co-authored-by: Dheeraj Turaga <dheerajturaga@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Scheduler including HA (high availability) scheduler backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants