Skip to content

Conversation

@uranusjr
Copy link
Member

@uranusjr uranusjr commented Mar 20, 2025

The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed.

When a task fails, it uses this information to decide whether it should go directly into FAIL_WITHOUT_RETRY, so the server does not need to calculate this again.

This makes the TITerminalState a bit wonky. Now the FAILED state actually makes the server always retry the task, and should probably be renamed to RETRY or something...? And FAIL_WITHOUT_RETRY can actually just be called FAILED since there's no other kind of failed. A simple rename causes a lot of things to change though since the enum is reused in a lot of places. So I decided to leave the names be for now.

The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API.

The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in #45106 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.

@uranusjr uranusjr requested a review from amoghrajesh March 20, 2025 10:58
@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:task-sdk labels Mar 20, 2025
Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question about naming aside, LGTM.

Some simple tests worth adding though I think

@uranusjr uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from 180e20a to f34454d Compare March 21, 2025 03:18
@uranusjr uranusjr requested a review from amoghrajesh March 21, 2025 03:18
@uranusjr
Copy link
Member Author

uranusjr commented Mar 21, 2025

I made some changes to how the runner sets state and how the API handles it. The FAIL_WITHOUT_RETRY state has been removed; now FAILED always means FAILED, and the runner may send UP_TO_RETRY for the API to handle.

Tests are to come; I want to see CI first to decide what to add.

@uranusjr uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from f34454d to 042133a Compare March 21, 2025 05:21
Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but generally looking good now

@uranusjr uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch 2 times, most recently from d3e40e0 to 868e7fe Compare March 21, 2025 12:50
@uranusjr uranusjr requested a review from rawwar as a code owner March 21, 2025 12:50
@uranusjr uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from 868e7fe to 10f3170 Compare March 21, 2025 13:29
The task needs this information to know whether it should run the retry
or failed callback if the task fails, so we calculate it eagerly and
pass it to the runner, instead of lazily only after the task has failed.

The information is used by the task runner to decide what state to set
the ti to when an error happens. If the task is eligible for retry, the
next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then
handled accordingly in the API.

The FAIL_WITHOUT_RETRY state has been removed from the code base since
it is no longer needed. The state was introduced in 1283cc3
to tell the API server to not check retry eligibility on certain failure
cases, but now that eligibility is always checked and used in the task
runner, the API server no longer needs this information; a request to
update to FAILED always set the TI to FAILED.
@uranusjr uranusjr force-pushed the check-retry-eligibility-ahead-of-time branch from 10f3170 to 93155b3 Compare March 21, 2025 14:17
@uranusjr uranusjr merged commit 5eca6c6 into apache:main Mar 21, 2025
88 of 89 checks passed
@uranusjr uranusjr deleted the check-retry-eligibility-ahead-of-time branch March 21, 2025 17:42
agupta01 pushed a commit to agupta01/airflow that referenced this pull request Mar 21, 2025
The task needs this information to know whether it should run the retry
or failed callback if the task fails, so we calculate it eagerly and
pass it to the runner, instead of lazily only after the task has failed.

The information is used by the task runner to decide what state to set
the ti to when an error happens. If the task is eligible for retry, the
next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then
handled accordingly in the API.

The FAIL_WITHOUT_RETRY state has been removed from the code base since
it is no longer needed. The state was introduced in 1283cc3
to tell the API server to not check retry eligibility on certain failure
cases, but now that eligibility is always checked and used in the task
runner, the API server no longer needs this information; a request to
update to FAILED always set the TI to FAILED.
shubham-pyc pushed a commit to shubham-pyc/airflow that referenced this pull request Mar 22, 2025
The task needs this information to know whether it should run the retry
or failed callback if the task fails, so we calculate it eagerly and
pass it to the runner, instead of lazily only after the task has failed.

The information is used by the task runner to decide what state to set
the ti to when an error happens. If the task is eligible for retry, the
next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then
handled accordingly in the API.

The FAIL_WITHOUT_RETRY state has been removed from the code base since
it is no longer needed. The state was introduced in 1283cc3
to tell the API server to not check retry eligibility on certain failure
cases, but now that eligibility is always checked and used in the task
runner, the API server no longer needs this information; a request to
update to FAILED always set the TI to FAILED.
nailo2c pushed a commit to nailo2c/airflow that referenced this pull request Apr 4, 2025
The task needs this information to know whether it should run the retry
or failed callback if the task fails, so we calculate it eagerly and
pass it to the runner, instead of lazily only after the task has failed.

The information is used by the task runner to decide what state to set
the ti to when an error happens. If the task is eligible for retry, the
next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then
handled accordingly in the API.

The FAIL_WITHOUT_RETRY state has been removed from the code base since
it is no longer needed. The state was introduced in 1283cc3
to tell the API server to not check retry eligibility on certain failure
cases, but now that eligibility is always checked and used in the task
runner, the API server no longer needs this information; a request to
update to FAILED always set the TI to FAILED.
kaxil added a commit to astronomer/airflow that referenced this pull request Apr 8, 2025
kaxil added a commit that referenced this pull request Apr 8, 2025
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request May 29, 2025
This was broken in apache/airflow#47996

closes apache/airflow#48927

GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Sep 24, 2025
This was broken in apache/airflow#47996

closes apache/airflow#48927

GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this pull request Oct 22, 2025
This was broken in apache/airflow#47996

closes apache/airflow#48927

GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API area:task-sdk

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants