Return 410 Gone for heartbeat when cleared TI exists in TIH#61631
Conversation
|
It's an intermittent failure, restarting it and taking a look |
amoghrajesh
left a comment
There was a problem hiding this comment.
Small comments, otherwise LGTM thanks!
Done! Thanks for reviewing! |
|
@andreahlert can you check the failing tests? |
|
Tests need fixing |
|
@andreahlert This PR has been converted to draft because it does not yet meet our Pull Request quality criteria. Issues found:
What to do next:
Converting a PR to draft is not a rejection — it is an invitation to bring the PR up to the project's standards so that maintainer review time is spent productively. If you have questions, feel free to ask on the Airflow Slack. Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: apache#53140
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
… message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation
0a744f6 to
0ef9c07
Compare
|
Tests are failing |
|
Rebased on main. GitHub Actions infra issue, let's see if gets green again. |
|
Rebased and ready for review. @amoghrajesh @ashb @kaxil |
…IH (#61631) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com> * Update task_instances.py Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com> * Update test_task_instances.py Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com> * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Backport successfully created: v3-2-testNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
|
…IH (#61631) (#64693) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py * Update task_instances.py * Update test_task_instances.py * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
…IH (#61631) (#64693) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py * Update task_instances.py * Update test_task_instances.py * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
…IH (#61631) (#64693) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py * Update task_instances.py * Update test_task_instances.py * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
…IH (#61631) (#64693) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py * Update task_instances.py * Update test_task_instances.py * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
…IH (#61631) (#64693) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py * Update task_instances.py * Update test_task_instances.py * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
…IH (#61631) (#64693) * Return 410 Gone for heartbeat when TI was cleared and moved to TIH When a running task instance is cleared, its previous try is archived to the Task Instance History table and the TI receives a new UUID. Subsequent heartbeats from the old process get a 404 because the old UUID no longer exists in the TI table. This change improves the error handling by checking the TIH table when a heartbeat TI is not found. If the UUID exists in TIH, return 410 Gone instead of 404 Not Found, giving the client a more specific signal that the task was cleared rather than never existing. - Server: check TIH on heartbeat NoResultFound, return 410 if found - Supervisor: handle 410 Gone same as 404/409 (terminate process) - Keep 404 for TIs that genuinely never existed closes: #53140 * Update task_instances.py * Update task_instances.py * Update test_task_instances.py * fix(api): use task_instance_id in heartbeat 410 path and align detail message - Replace undefined ti_id_str with task_instance_id in TIH query and log - Use task_instance_id (UUID) for TIH.task_instance_id comparison - Set 410 Gone detail message to match test expectation --------- (cherry picked from commit ce1270b) Co-authored-by: André Ahlert <andre@aex.partners> Co-authored-by: Amogh Desai <amoghrajesh1999@gmail.com>
Closes: #53140
When a running Task Instance is cleared,
prepare_db_for_next_try()archives the current try to the Task Instance History (TIH) table and assigns a new UUID. This means heartbeats from the old process will fail with a 404 "Not Found" because the UUID no longer exists in the TI table. However, a generic 404 is misleading since the TI did exist - it was just cleared.This PR adds a TIH lookup in the
NoResultFoundhandler of the heartbeat endpoint:This gives the task SDK supervisor a more specific signal and matches HTTP semantics (410 means "the target resource is no longer available at the origin server and this condition is likely to be permanent").
Changes:
execution_api/routes/task_instances.py: Added TIH existence check in the heartbeatNoResultFoundhandler; added 410 to OpenAPI response schemaexecution_time/supervisor.py: AddedHTTPStatus.GONEto the set of status codes that the supervisor treats as "stop heartbeating" (alongside NOT_FOUND and CONFLICT)test_task_instances.py: Added test usingprepare_db_for_next_try()to properly simulate a cleared TI and verify the 410 responsePrevious attempt: #56443 (closed as stale). This PR addresses the review feedback from that PR by actually checking the TIH table rather than blindly changing 404 to 410.