Ensure AsyncKubernetesHook.watch_pod_events streams events until pod completion/termination #60532

SameerMesiah97 · 2026-01-14T16:54:01Z

Description

This change refactors watch_pod_events so that it continues watching events for the full lifecycle of the target pod, rather than stopping after a single watch stream terminates.

The new implementation now:

Reconnects automatically when a watch stream terminates (e.g. server-side timeout).
Resumes watching from the last observed resourceVersion.
Handles Kubernetes 410 Gone errors by restarting the watch from the current state.
Terminates cleanly when the pod completes or is deleted.

This ensures that watch_pod_events continues yielding events for the full lifecycle of a pod instead of silently stopping after timeout_seconds.

Rationale

The Kubernetes Watch API enforces server-side timeouts, meaning a single watch stream is not guaranteed to remain open indefinitely. The previous implementation treated timeout_seconds as an implicit upper bound on the total duration of event streaming, causing the generator to stop yielding events after the first watch termination — even while the pod was still running.

This behavior is surprising and contradicts what users reasonably expect from the method name (watch_pod_events), the docstring and standard Kubernetes watch semantics. The updated implementation aligns with Kubernetes best practices by treating watch termination as a recoverable condition and transparently reconnecting until the pod reaches a terminal lifecycle state.

Backwards Compatibility

This change does not alter the public API or method signature. However, it does change runtime behavior:

timeout_seconds now applies only to individual watch connections, not the overall duration of event streaming.
Event streaming continues until pod completion or deletion instead of stopping silently after a timeout.

While it is possible that some users rely on the previous behavior, it is more likely that existing deployments have implemented workarounds (e.g. external loops or polling) to compensate for the premature termination. The new behavior is consistent with documented intent and Kubernetes conventions, and therefore adheres to the principle of least surprise.

Tests

Added unit tests to validate the following expected behaviors:

Reconnects and continues streaming events after a watch stream ends (e.g. timeout).
Restarts the watch when Kubernetes returns 410 Gone due to a stale resourceVersion.
Stops cleanly when the pod is deleted (404).
Stops immediately when the pod reaches a terminal phase (Succeeded or Failed).

Existing tests have been updated to account for the addition of pod state inspection in watch_pod_events.

Notes

_load_config is now guarded to ensure configuration is loaded once per hook instance.; it no longer returns an API client. API client instantiation is now solely the responsibility of get_conn, enabling reconnection in watch_pod_events without redundant configuration reloads.
The internal helper api_client_from_kubeconfig_file used to construct and return an API client from _load_config has been removed. The call site of this helper has been replaced with a call to async_config.load_kube_config.
The exception message raised when multiple configuration sources are supplied has been clarified to more accurately describe the error.
Polling fallback behavior is preserved and now continues until the pod reaches a terminal lifecycle state, matching the updated watch semantics.

Closes: #60495

…nts when the Kubernetes watch stream ended (e.g. due to timeout_seconds), even while the pod was still running. This change reconnects on watch termination, resumes from the last observed resourceVersion, restarts on stale resourceVersion errors (410), and stops only when the pod completes or is deleted. Permission-denied watches still fall back to polling. As part of this fix, kubeconfig loading is cached and _load_config no longer returns an API client, clarifying its responsibility and avoiding repeated config loading during watch reconnects. Tests cover reconnection behavior, stale resourceVersion recovery, and clean termination on pod completion or deletion.

SameerMesiah97 · 2026-01-20T12:31:29Z

Requesting review for this.

SameerMesiah97 · 2026-01-23T13:08:16Z

@jscheffl

Perhaps you could help with review?

jscheffl · 2026-01-23T22:27:09Z

@SameerMesiah97 I can but please be aware of that I do Airflow reviews and contributions as a side-job so please include some patience for reviews.

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py

jscheffl

For me by pure code reading the PR looks okay for me. I am not sure and can you tell me why you made changes in config loading in parallel to the changes in watching? Feeling like this might be a reason which I do not see and might be better a separate PR.

As I am not an expert on K9s event watch, this feature was introduced by @wolfdn Can you help me with an opinion on this? Was there a reason that you did not make a watch loop for the events?

SameerMesiah97 · 2026-01-24T11:32:49Z

For me by pure code reading the PR looks okay for me. I am not sure and can you tell me why you made changes in config loading in parallel to the changes in watching? Feeling like this might be a reason which I do not see and might be better a separate PR.

The modified implementation for watch_pod_events will now call _load_config mutliple times through get_conn. It was indeed possible for my new implementation to work without modifying _load_config. In fact, this is what I initially implemented. But then I observed that the task logs were being polluted with logs from config loading along with the event messages from the target pod. Since this is an observability function, I added a small config-loading guard to ensure that the config is loaded once per hook instance and avoids repeated log spam when the watch reconnects. That’s why the changes ended up in the same PR.

As I am not an expert on K9s event watch, this feature was introduced by @wolfdn Can you help me with an opinion on this? Was there a reason that you did not make a watch loop for the events?

I would love to hear feedback from @wolfdn on this as well.

wolfdn · 2026-01-26T12:06:02Z

@SameerMesiah97 Looks good to me! There was no specific reason to not create a watch loop for the events - such a loop is indeed necessary to pick up the watch connection again after it has timed out / terminated for some other reason.
Thanks for fixing this!

…nts when (apache#60532) the Kubernetes watch stream ended (e.g. due to timeout_seconds), even while the pod was still running. This change reconnects on watch termination, resumes from the last observed resourceVersion, restarts on stale resourceVersion errors (410), and stops only when the pod completes or is deleted. Permission-denied watches still fall back to polling. As part of this fix, kubeconfig loading is cached and _load_config no longer returns an API client, clarifying its responsibility and avoiding repeated config loading during watch reconnects. Tests cover reconnection behavior, stale resourceVersion recovery, and clean termination on pod completion or deletion. Co-authored-by: Sameer Mesiah <smesiah971@gmail.com>

SameerMesiah97 requested review from hussein-awala and jedcunningham as code owners January 14, 2026 16:54

boring-cyborg bot added area:providers provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Jan 14, 2026

SameerMesiah97 changed the title ~~Ensure AsyncKubernetesHook.watch_pod_events streams events until pod termination~~ Ensure AsyncKubernetesHook.watch_pod_events streams events until pod completion/termination Jan 14, 2026

SameerMesiah97 mentioned this pull request Jan 14, 2026

def watch_pod_events from AsyncKubernetesHook stops after timeout_seconds even when pod is still running #60495

Closed

2 tasks

jscheffl reviewed Jan 23, 2026

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py Show resolved Hide resolved

jscheffl reviewed Jan 23, 2026

View reviewed changes

providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/hooks/kubernetes.py Show resolved Hide resolved

jscheffl reviewed Jan 23, 2026

View reviewed changes

jscheffl approved these changes Jan 24, 2026

View reviewed changes

wolfdn approved these changes Jan 26, 2026

View reviewed changes

jscheffl merged commit 52867d9 into apache:main Jan 26, 2026
106 checks passed

vincbeck mentioned this pull request Jan 28, 2026

Status of testing Providers that were prepared on January 28, 2026 #61165

Open

85 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure AsyncKubernetesHook.watch_pod_events streams events until pod completion/termination #60532

Ensure AsyncKubernetesHook.watch_pod_events streams events until pod completion/termination #60532

SameerMesiah97 commented Jan 14, 2026 •

edited

Loading

Uh oh!

SameerMesiah97 commented Jan 20, 2026

Uh oh!

SameerMesiah97 commented Jan 23, 2026

Uh oh!

jscheffl commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

jscheffl left a comment

Uh oh!

SameerMesiah97 commented Jan 24, 2026 •

edited

Loading

Uh oh!

wolfdn commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ensure AsyncKubernetesHook.watch_pod_events streams events until pod completion/termination #60532

Ensure AsyncKubernetesHook.watch_pod_events streams events until pod completion/termination #60532

Conversation

SameerMesiah97 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SameerMesiah97 commented Jan 20, 2026

Uh oh!

SameerMesiah97 commented Jan 23, 2026

Uh oh!

jscheffl commented Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

jscheffl left a comment

Choose a reason for hiding this comment

Uh oh!

SameerMesiah97 commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wolfdn commented Jan 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

SameerMesiah97 commented Jan 14, 2026 •

edited

Loading

SameerMesiah97 commented Jan 24, 2026 •

edited

Loading