Skip to content

Conversation

@prdai
Copy link
Contributor

@prdai prdai commented Jul 2, 2025

This change addresses a race condition in the task runner where downstream failure callbacks might observe an uninitialized or stale end_date and duration on the TaskInstance. By explicitly setting ti.end_date before constructing the TaskState in all exception handlers, we guarantee that any on_failure_callback receives a fully populated context, preventing confusing debug sessions and user‐defined callback failures.

Key Changes:

  • Set ti.end_date early in both AirflowFailException and generic exception handlers, before creating the TaskState.
  • Populate duration automatically by virtue of the populated end_date, ensuring callbacks can compute and report accurate runtime metrics.
  • Update existing tests and add a new test (test_task_runner_on_failure_callback_context) that verifies both end_date and non‐negative duration are present in the failure callback context.
  • Fix race condition where callbacks previously saw incomplete TaskInstance data, leading to misleading or missing timing information.

This PR closes #52630

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, turns out ti.end_date is never populated on task runner side

- Set ti.end_date before creating TaskState in exception handlers
- Ensures on_failure_callback context has proper end_date and duration
- Fixes race condition where callbacks received incomplete TaskInstance data
- Add test to verify callback context contains required timing information
@prdai prdai force-pushed the bugfix/52630-task-instance-end-date-duration-on-failure branch from 387adee to 896d0c1 Compare July 13, 2025 06:22
@eladkal eladkal requested a review from amoghrajesh July 27, 2025 10:32
@eladkal eladkal added this to the Airflow 3.0.4 milestone Jul 27, 2025
@eladkal eladkal added type:bug-fix Changelog: Bug Fixes backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch labels Jul 27, 2025
@ashb
Copy link
Member

ashb commented Jul 31, 2025

Can you check/test that success callbacks also have an end date, and then we can get this merged

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to test success call backs too. Looks good otherwise

@ashb ashb modified the milestones: Airflow 3.0.4, Airflow 3.0.5 Aug 5, 2025
@prdai
Copy link
Contributor Author

prdai commented Aug 8, 2025

hi @ashb @amoghrajesh, i tested both success and failure callbacks to make sure that they have the end date and duration info, the following are the log results:

$ python -m pytest task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks -v

============================= test session starts ==============================
collecting ... collected 11 items

task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_calls_callback[success] PASSED [  9%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_calls_callback[skipped] PASSED [ 18%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_calls_callback[failure] PASSED [ 27%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_calls_callback[retry] PASSED [ 36%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_on_failure_callback_context PASSED [ 45%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_on_success_callback_context PASSED [ 54%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_both_callbacks_have_timing_info PASSED [ 63%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_not_fail_on_failed_callback[success] PASSED [ 72%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_not_fail_on_failed_callback[skipped] PASSED [ 81%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_not_fail_on_failed_callback[failure] PASSED [ 90%]
task-sdk/tests/task_sdk/execution_time/test_task_runner.py::TestTaskRunnerCallsCallbacks::test_task_runner_not_fail_on_failed_callback[retry] PASSED [100%]

========================= 11 passed, 1 warning in 1.93s =========================

I had to update the _handle_current_task_failed to make sure that the TaskInstance's end_date (i had missed this in my before), and also added a few tests to the `test_task_runner.py

Copy link
Contributor

@amoghrajesh amoghrajesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1 because it has been tested.

@ashb wdyt?

@amoghrajesh
Copy link
Contributor

Unrelated failure, retriggering it

@amoghrajesh amoghrajesh merged commit eda89ae into apache:main Aug 13, 2025
141 of 142 checks passed
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 13, 2025

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

github-actions bot pushed a commit that referenced this pull request Aug 13, 2025
…efore invoking failure callbacks (#52729)

* Fix: Set TaskInstance end_date before failure callbacks (#52630)

- Set ti.end_date before creating TaskState in exception handlers
- Ensures on_failure_callback context has proper end_date and duration
- Fixes race condition where callbacks received incomplete TaskInstance data
- Add test to verify callback context contains required timing information

* Enhance task callbacks to include end_date and duration metrics
(cherry picked from commit eda89ae)

Co-authored-by: Ranuga Disansa <79456372+Programmer-RD-AI@users.noreply.github.com>
@github-actions
Copy link

Backport successfully created: v3-0-test

Status Branch Result
v3-0-test PR Link

kaxil pushed a commit that referenced this pull request Aug 13, 2025
…efore invoking failure callbacks (#52729) (#54458)

* Fix: Set TaskInstance end_date before failure callbacks (#52630)

- Set ti.end_date before creating TaskState in exception handlers
- Ensures on_failure_callback context has proper end_date and duration
- Fixes race condition where callbacks received incomplete TaskInstance data
- Add test to verify callback context contains required timing information

* Enhance task callbacks to include end_date and duration metrics
(cherry picked from commit eda89ae)

Co-authored-by: Ranuga Disansa <79456372+Programmer-RD-AI@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:task-sdk backport-to-v3-1-test Mark PR with this label to backport to v3-1-test branch type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

end_date and duration are not available in task_instance variable if task's state is failed

4 participants