You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pykube-ng, kubernetes, requests, and any other synchronous client libraries use the streaming responses of the built-in urllib3 and http for watching over the k8s-events.
However, if nothing happens on the k8s-event stream (i.e. no k8s resources are added/modified/deleted), the streaming response spends most of its time in the blocking read() operation on a socket. It can remain there for long time — minutes, hours — until some data arrives on the socket.
If the streaming response runs in a thread, while the main thread is used for an asyncio event-loop, such stream cannot be closed/cancelled/terminated (because of the blocking read()). This, in turn, makes the application to hang on exit, holding its pod from restarting, since the thread is not finished until the read() call is finished.
There is no easy way to terminate the blocking read() operation on a socket. The only way is a dirty hack with the OS-level process signals, which interrupt the I/O operations on low (libc&co) level.
This hack can be done only from the main thread (because signal handlers can be set only from the main thread — a Python's limitation), and therefore is only suitable for the runnable application rather than for a library.
In kopf run, we can be sure that the event-loop and its asyncio tasks run in the main thread. In case of explicit task orchestration, we can assume that the operator developers do the same, but warn if they don't.
This PR:
Moves the streaming watch-requests into their dedicated threads (instead of the asyncio thread-pool executors for every next() call).
These dedicated threads are tracked in each watch_objs() call, and are interrupted when the watch-stream exits for any reason (an error, generator-exit, task cancellation, etc).
Tasks cancellation
Beside of the stream termination, this PR improves the way how the tasks are orchestrated in an event-loop, so that they are cancelled properly, and have some minimal time for graceful exit (e.g. cleanups). This is used, for example, by the peering disappear() call to remove self from the peering object (previously, this call was cancelled on start, and therefore never actually executed).
Also, the sub-tasks of the root tasks are tracked and also cancelled. For example, the scheduled workers, handlers, all shielded tasks, and generally anything that is not produced by create_tasks(), but runs in the loop. This guarantees, that when run() exits, it leaves nothing behind.
Types of Changes
Bug fix (non-breaking change which fixes an issue)
Refactor/improvements
Review
List of tasks the reviewer must do to review the PR
When an operator gets SIGINT (Ctrl+C) or SIGTERM (in pods), it exits immediately, with no extra waits for anything.
This also improves its behaviour in PyCharm, where the operator can now be stopped and restarted in one click (previously required double-stopping).
Description
Watch-stream termination
pykube-ng
,kubernetes
,requests
, and any other synchronous client libraries use the streaming responses of the built-inurllib3
andhttp
for watching over the k8s-events.These streaming requests/responses can be closed when a chunk/line is yielded to the consumer, the control flow is returned to the caller, and the streaming socket itself is idling. E.g., for
requests
: https://2.python-requests.org/en/master/user/advanced/#streaming-requestsHowever, if nothing happens on the k8s-event stream (i.e. no k8s resources are added/modified/deleted), the streaming response spends most of its time in the blocking
read()
operation on a socket. It can remain there for long time — minutes, hours — until some data arrives on the socket.If the streaming response runs in a thread, while the main thread is used for an asyncio event-loop, such stream cannot be closed/cancelled/terminated (because of the blocking
read()
). This, in turn, makes the application to hang on exit, holding its pod from restarting, since the thread is not finished until theread()
call is finished.There is no easy way to terminate the blocking
read()
operation on a socket. The only way is a dirty hack with the OS-level process signals, which interrupt the I/O operations on low (libc&co) level.This hack can be done only from the main thread (because signal handlers can be set only from the main thread — a Python's limitation), and therefore is only suitable for the runnable application rather than for a library.
In
kopf run
, we can be sure that the event-loop and its asyncio tasks run in the main thread. In case of explicit task orchestration, we can assume that the operator developers do the same, but warn if they don't.This PR:
next()
call).watch_objs()
call, and are interrupted when the watch-stream exits for any reason (an error, generator-exit, task cancellation, etc).Tasks cancellation
Beside of the stream termination, this PR improves the way how the tasks are orchestrated in an event-loop, so that they are cancelled properly, and have some minimal time for graceful exit (e.g. cleanups). This is used, for example, by the peering
disappear()
call to remove self from the peering object (previously, this call was cancelled on start, and therefore never actually executed).Also, the sub-tasks of the root tasks are tracked and also cancelled. For example, the scheduled workers, handlers, all shielded tasks, and generally anything that is not produced by
create_tasks()
, but runs in the loop. This guarantees, that whenrun()
exits, it leaves nothing behind.Types of Changes
Review
List of tasks the reviewer must do to review the PR
Closed in favour of separate #147 #148 #149 #152 — one per fix.
The text was updated successfully, but these errors were encountered: