Skip to content

Fix scheduler heartbeat misses caused by slow reschedule dependency check#61983

Merged
ephraimbuddy merged 1 commit into
apache:mainfrom
astronomer:add-composite-index-task-reschedule
Feb 17, 2026
Merged

Fix scheduler heartbeat misses caused by slow reschedule dependency check#61983
ephraimbuddy merged 1 commit into
apache:mainfrom
astronomer:add-composite-index-task-reschedule

Conversation

@ephraimbuddy
Copy link
Copy Markdown
Contributor

When many task instances enter UP_FOR_RESCHEDULE state, the query to fetch the latest reschedule date becomes slow due to a missing composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since fddf4a7, non-sensor tasks can also be rescheduled, significantly increasing the number of rows per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule table, replacing the single-column (ti_id) index.

The reschedule query:

next_reschedule_date = session.scalar(

Other places this can benefit:

  1. TR.stmt_for_task_instance(ti, descending=False).with_only_columns(TR.start_date).limit(1)

@boring-cyborg boring-cyborg Bot added area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57) kind:documentation labels Feb 16, 2026
…heck

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.
@ephraimbuddy ephraimbuddy force-pushed the add-composite-index-task-reschedule branch from 7beb11c to c8f33d8 Compare February 16, 2026 10:29
@ephraimbuddy ephraimbuddy added this to the Airflow 3.1.8 milestone Feb 16, 2026
@ephraimbuddy ephraimbuddy merged commit 9880716 into apache:main Feb 17, 2026
129 checks passed
@ephraimbuddy ephraimbuddy deleted the add-composite-index-task-reschedule branch February 17, 2026 07:22
@github-actions
Copy link
Copy Markdown
Contributor

Backport failed to create: v3-1-test. View the failure log Run details

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
v3-1-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 9880716 v3-1-test

This should apply the commit to the v3-1-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

OscarLigthart pushed a commit to OscarLigthart/airflow that referenced this pull request Feb 17, 2026
…heck (apache#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.
ephraimbuddy added a commit that referenced this pull request Feb 17, 2026
…heck (#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.

(cherry picked from commit 9880716)
ephraimbuddy added a commit that referenced this pull request Feb 17, 2026
…dependency check (#61983) (#62068)

* Add index on task_reschedule ti_id (#60931)

(cherry picked from commit 14e811c)

* Fix scheduler heartbeat misses caused by slow reschedule dependency check (#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.

(cherry picked from commit 9880716)

---------

Co-authored-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
choo121600 pushed a commit to choo121600/airflow that referenced this pull request Feb 22, 2026
…heck (apache#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.
Subham-KRLX pushed a commit to Subham-KRLX/airflow that referenced this pull request Mar 4, 2026
…heck (apache#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.
vatsrahul1001 pushed a commit that referenced this pull request Mar 4, 2026
…dependency check (#61983) (#62068)

* Add index on task_reschedule ti_id (#60931)

(cherry picked from commit 14e811c)

* Fix scheduler heartbeat misses caused by slow reschedule dependency check (#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.

(cherry picked from commit 9880716)

---------

Co-authored-by: Guan-Ming (Wesley) Chiu <105915352+guan404ming@users.noreply.github.com>
dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026
…heck (apache#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.
Ankurdeewan pushed a commit to Ankurdeewan/airflow that referenced this pull request Mar 15, 2026
…heck (apache#61983)

When many task instances enter UP_FOR_RESCHEDULE state, the query
to fetch the latest reschedule date becomes slow due to a missing
composite index. This causes the scheduler to miss heartbeats.

Previously only sensors used reschedule mode, but since
fddf4a7, non-sensor tasks can
also be rescheduled, significantly increasing the number of rows
per task instance in the task_reschedule table.

Add a composite (ti_id, id DESC) index to the task_reschedule
table, replacing the single-column (ti_id) index.
@matthieuauger
Copy link
Copy Markdown

Thank you very much for this contribution, I was having headaches understanding why my scheduler was stuck for 30s and was slowing down my whole pipeline. This PR solves everything

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57) kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants