Calculate retry eligibility before task runs #47996

uranusjr · 2025-03-20T10:58:23Z

The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed.

~~When a task fails, it uses this information to decide whether it should go directly into FAIL_WITHOUT_RETRY, so the server does not need to calculate this again.~~

This makes the TITerminalState a bit wonky. Now the FAILED state actually makes the server always retry the task, and should probably be renamed to RETRY or something...? And FAIL_WITHOUT_RETRY can actually just be called FAILED since there's no other kind of failed. A simple rename causes a lot of things to change though since the enum is reused in a lot of places. So I decided to leave the names be for now.

The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API.

The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in #45106 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.

airflow/api_fastapi/execution_api/datamodels/taskinstance.py

ashb

Question about naming aside, LGTM.

Some simple tests worth adding though I think

airflow/api_fastapi/execution_api/datamodels/taskinstance.py

task-sdk/src/airflow/sdk/execution_time/task_runner.py

uranusjr · 2025-03-21T03:19:58Z

I made some changes to how the runner sets state and how the API handles it. The FAIL_WITHOUT_RETRY state has been removed; now FAILED always means FAILED, and the runner may send UP_TO_RETRY for the API to handle.

Tests are to come; I want to see CI first to decide what to add.

amoghrajesh

Some comments, but generally looking good now

airflow/api_fastapi/execution_api/datamodels/taskinstance.py

airflow/api_fastapi/execution_api/routes/task_instances.py

The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed. The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API. The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in 1283cc3 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.

This was broken in apache#47996 closes apache#48927

This was broken in #47996 closes #48927

This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44

uranusjr requested a review from amoghrajesh March 20, 2025 10:58

uranusjr requested review from ephraimbuddy and pierrejeambrun as code owners March 20, 2025 10:58

boring-cyborg bot added area:API Airflow's REST/HTTP API area:task-sdk labels Mar 20, 2025

ashb reviewed Mar 20, 2025

View reviewed changes

airflow/api_fastapi/execution_api/datamodels/taskinstance.py Outdated Show resolved Hide resolved

ashb approved these changes Mar 20, 2025

View reviewed changes

kaxil approved these changes Mar 20, 2025

View reviewed changes

amoghrajesh reviewed Mar 20, 2025

View reviewed changes

airflow/api_fastapi/execution_api/datamodels/taskinstance.py Outdated Show resolved Hide resolved

task-sdk/src/airflow/sdk/execution_time/task_runner.py Outdated Show resolved Hide resolved

uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from 180e20a to f34454d Compare March 21, 2025 03:18

uranusjr requested a review from amoghrajesh March 21, 2025 03:18

uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from f34454d to 042133a Compare March 21, 2025 05:21

uranusjr requested review from XD-DENG and mobuchowski as code owners March 21, 2025 05:21

amoghrajesh reviewed Mar 21, 2025

View reviewed changes

airflow/api_fastapi/execution_api/datamodels/taskinstance.py Outdated Show resolved Hide resolved

airflow/api_fastapi/execution_api/datamodels/taskinstance.py Show resolved Hide resolved

airflow/api_fastapi/execution_api/routes/task_instances.py Outdated Show resolved Hide resolved

uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch 2 times, most recently from d3e40e0 to 868e7fe Compare March 21, 2025 12:50

uranusjr requested a review from rawwar as a code owner March 21, 2025 12:50

uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from 868e7fe to 10f3170 Compare March 21, 2025 13:29

uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from 10f3170 to 93155b3 Compare March 21, 2025 14:17

uranusjr merged commit 5eca6c6 into apache:main Mar 21, 2025
88 of 89 checks passed

uranusjr deleted the check-retry-eligibility-ahead-of-time branch March 21, 2025 17:42

eladkal mentioned this pull request Mar 26, 2025

Status of testing Providers that were prepared on March 26, 2025 #48395

Closed

eladkal mentioned this pull request Apr 6, 2025

Status of testing Providers that were prepared on April 06, 2025 #48842

Closed

kaxil added a commit to astronomer/airflow that referenced this pull request Apr 8, 2025

AIP-72: Fix running task with retries

3f46988

This was broken in apache#47996 closes apache#48927

kaxil mentioned this pull request Apr 8, 2025

AIP-72: Fix running task with retries #48967

Merged

kaxil added a commit that referenced this pull request Apr 8, 2025

AIP-72: Fix running task with retries (#48967)

d5ea589

This was broken in #47996 closes #48927

kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 29, 2025

AIP-72: Fix running task with retries (#48967)

72cf923

This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44

kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 24, 2025

AIP-72: Fix running task with retries (#48967)

2ddf5b8

This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44

kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 22, 2025

AIP-72: Fix running task with retries (#48967)

c7d0816

This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculate retry eligibility before task runs #47996

Calculate retry eligibility before task runs #47996

Uh oh!

uranusjr commented Mar 20, 2025 •

edited

Loading

Uh oh!

Uh oh!

ashb left a comment

Uh oh!

Uh oh!

Uh oh!

uranusjr commented Mar 21, 2025 •

edited

Loading

Uh oh!

amoghrajesh left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Calculate retry eligibility before task runs #47996

Calculate retry eligibility before task runs #47996

Uh oh!

Conversation

uranusjr commented Mar 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ashb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

uranusjr commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amoghrajesh left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

uranusjr commented Mar 20, 2025 •

edited

Loading

uranusjr commented Mar 21, 2025 •

edited

Loading

amoghrajesh left a comment •

edited

Loading