fix: always defer once more after log fetching to ensure pod completion is handled #40891
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix based on a real world issue seen in production where a failing pod does not make the associated task fail
Running theory: then a pod fails while in
self.pod_manager.fetch_container_logs,runningproperty of the returnedpod_log_statusobject is False, hence we skip the deferrable call and jump directly toself._cleanBut the issue is that the
eventobject is never refreshed and still carries therunningstatus, hence hitting this code path:airflow/airflow/providers/cncf/kubernetes/operators/pod.py
Lines 793 to 794 in 4cbfcd7
and making the task instance returns without error
Proposed fix: call defer whatever the pod status is after fetching logs, so that the fail status is picked up during the next trigger run
It adds a bit of delay to the pod completion detection but is simple/stupid :)
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in newsfragments.