improve deferrable KPO handling of deleted pods in between polls #56976

m1racoli · 2025-10-21T18:28:17Z

While KubernetesPodTrigger is polling the KPO pod, in case of large values for poll_interval (i.e 900 seconds) it can happen that the completed podhas been already been cleaned up in-between polls. This causes the following chain of events:

404 pod not found error inside KubernetesPodTrigger
KubernetesPodOperator.trigger_reentry ending up in another 404 pod not found in self.hook.get_pod
KubernetesPodOperator._clean being called as part of the finally block
KubernetesPodOperator.pod_manager.await_pod_completion failing to handle self.pod == None ending up in AttributeError: 'NoneType' object has no attribute 'metadata'
the stack trace of the original error in 1. is never properly printed

We improve this situation with the following adjustments:

log the original exception with stack trace in the trigger for better visibility of the original error
log the actual poll interval being used when starting the trigger
return from KubernetesPodOperator._call early if self.pod is None

This makes the error's stacktrace easier to review in the logs and does not depend on `trigger_reentry` to do so.

This makes it easier to know which value for poll_interval is actually being used.

It can happen that the pod doesn't exist anymore when running `trigger_reentry`. In that case `self.hook.get_pod` will cause a 404 API exception, which then will end up calling `self._clean` which assumed `self.pod` not to be None.

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/triggers/pod.py

This doesn't work in Airflow 2.

…che#56976) * refactor: log exception in KubernetesPodTrigger This makes the error's stacktrace easier to review in the logs and does not depend on `trigger_reentry` to do so. * refactor: log poll_interval on trigger start This makes it easier to know which value for poll_interval is actually being used. * fix: handle missing pod when trying to cleanup after trigger reentry It can happen that the pod doesn't exist anymore when running `trigger_reentry`. In that case `self.hook.get_pod` will cause a 404 API exception, which then will end up calling `self._clean` which assumed `self.pod` not to be None. * fix: don't use structured logging when logging poll interval This doesn't work in Airflow 2.

m1racoli added 3 commits October 21, 2025 12:16

refactor: log exception in KubernetesPodTrigger

a45e522

This makes the error's stacktrace easier to review in the logs and does not depend on `trigger_reentry` to do so.

refactor: log poll_interval on trigger start

f2ea59b

This makes it easier to know which value for poll_interval is actually being used.

m1racoli requested review from hussein-awala and jedcunningham as code owners October 21, 2025 18:28

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Oct 21, 2025

ashb approved these changes Oct 21, 2025

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/triggers/pod.py Show resolved Hide resolved

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/triggers/pod.py Outdated Show resolved Hide resolved

fix: don't use structured logging when logging poll interval

e6f4795

This doesn't work in Airflow 2.

ashb merged commit 54fa258 into apache:main Oct 28, 2025
91 checks passed

potiuk mentioned this pull request Nov 14, 2025

Status of testing Providers that were prepared on November 14, 2025 #58315

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve deferrable KPO handling of deleted pods in between polls #56976

improve deferrable KPO handling of deleted pods in between polls #56976

Uh oh!

m1racoli commented Oct 21, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

improve deferrable KPO handling of deleted pods in between polls #56976

improve deferrable KPO handling of deleted pods in between polls #56976

Uh oh!

Conversation

m1racoli commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

m1racoli commented Oct 21, 2025 •

edited

Loading