-
Notifications
You must be signed in to change notification settings - Fork 16.4k
Calculate retry eligibility before task runs #47996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculate retry eligibility before task runs #47996
Conversation
ashb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question about naming aside, LGTM.
Some simple tests worth adding though I think
180e20a to
f34454d
Compare
|
I made some changes to how the runner sets state and how the API handles it. The FAIL_WITHOUT_RETRY state has been removed; now FAILED always means FAILED, and the runner may send UP_TO_RETRY for the API to handle. Tests are to come; I want to see CI first to decide what to add. |
f34454d to
042133a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but generally looking good now
d3e40e0 to
868e7fe
Compare
868e7fe to
10f3170
Compare
The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed. The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API. The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in 1283cc3 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.
10f3170 to
93155b3
Compare
The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed. The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API. The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in 1283cc3 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.
The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed. The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API. The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in 1283cc3 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.
The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed. The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API. The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in 1283cc3 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.
This was broken in apache#47996 closes apache#48927
This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44
This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44
This was broken in apache/airflow#47996 closes apache/airflow#48927 GitOrigin-RevId: d5ea5890fd7a8b769935638c50a46214a553fc44
The task needs this information to know whether it should run the retry or failed callback if the task fails, so we calculate it eagerly and pass it to the runner, instead of lazily only after the task has failed.
When a task fails, it uses this information to decide whether it should go directly into FAIL_WITHOUT_RETRY, so the server does not need to calculate this again.This makes the TITerminalState a bit wonky. Now the FAILED state actually makes the server always retry the task, and should probably be renamed to RETRY or something...? And FAIL_WITHOUT_RETRY can actually just be called FAILED since there's no other kind of failed. A simple rename causes a lot of things to change though since the enum is reused in a lot of places. So I decided to leave the names be for now.The information is used by the task runner to decide what state to set the ti to when an error happens. If the task is eligible for retry, the next state is set to UP_FOR_RETRY; otherwise it's FAILED. This is then handled accordingly in the API.
The FAIL_WITHOUT_RETRY state has been removed from the code base since it is no longer needed. The state was introduced in #45106 to tell the API server to not check retry eligibility on certain failure cases, but now that eligibility is always checked and used in the task runner, the API server no longer needs this information; a request to update to FAILED always set the TI to FAILED.