Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR] Die properly: fast and with no remnants #143

Closed
2 tasks
kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Closed
2 tasks

[PR] Die properly: fast and with no remnants #143

kopf-archiver bot opened this issue Aug 18, 2020 · 0 comments
Labels
archive bug Something isn't working

Comments

@kopf-archiver
Copy link

kopf-archiver bot commented Aug 18, 2020

A pull request by nolar at 2019-07-10 15:42:31+00:00
Original URL: zalando-incubator/kopf#143
 

When an operator gets SIGINT (Ctrl+C) or SIGTERM (in pods), it exits immediately, with no extra waits for anything.

This also improves its behaviour in PyCharm, where the operator can now be stopped and restarted in one click (previously required double-stopping).

Issue : #142
Requires: hjacobs/pykube#31

Description

Watch-stream termination

pykube-ng, kubernetes, requests, and any other synchronous client libraries use the streaming responses of the built-in urllib3 and http for watching over the k8s-events.

These streaming requests/responses can be closed when a chunk/line is yielded to the consumer, the control flow is returned to the caller, and the streaming socket itself is idling. E.g., for requests: https://2.python-requests.org/en/master/user/advanced/#streaming-requests

However, if nothing happens on the k8s-event stream (i.e. no k8s resources are added/modified/deleted), the streaming response spends most of its time in the blocking read() operation on a socket. It can remain there for long time — minutes, hours — until some data arrives on the socket.

If the streaming response runs in a thread, while the main thread is used for an asyncio event-loop, such stream cannot be closed/cancelled/terminated (because of the blocking read()). This, in turn, makes the application to hang on exit, holding its pod from restarting, since the thread is not finished until the read() call is finished.

There is no easy way to terminate the blocking read() operation on a socket. The only way is a dirty hack with the OS-level process signals, which interrupt the I/O operations on low (libc&co) level.

This hack can be done only from the main thread (because signal handlers can be set only from the main thread — a Python's limitation), and therefore is only suitable for the runnable application rather than for a library.

In kopf run, we can be sure that the event-loop and its asyncio tasks run in the main thread. In case of explicit task orchestration, we can assume that the operator developers do the same, but warn if they don't.

This PR:

  • Moves the streaming watch-requests into their dedicated threads (instead of the asyncio thread-pool executors for every next() call).
  • These dedicated threads are tracked in each watch_objs() call, and are interrupted when the watch-stream exits for any reason (an error, generator-exit, task cancellation, etc).

Tasks cancellation

Beside of the stream termination, this PR improves the way how the tasks are orchestrated in an event-loop, so that they are cancelled properly, and have some minimal time for graceful exit (e.g. cleanups). This is used, for example, by the peering disappear() call to remove self from the peering object (previously, this call was cancelled on start, and therefore never actually executed).

Also, the sub-tasks of the root tasks are tracked and also cancelled. For example, the scheduled workers, handlers, all shielded tasks, and generally anything that is not produced by create_tasks(), but runs in the loop. This guarantees, that when run() exits, it leaves nothing behind.

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)
  • Refactor/improvements

Review

List of tasks the reviewer must do to review the PR

  • Tests
  • Documentation

Commented by nolar at 2019-07-16 10:33:52+00:00
 

Closed in favour of separate #147 #148 #149 #152 — one per fix.

@kopf-archiver kopf-archiver bot closed this as completed Aug 18, 2020
@kopf-archiver kopf-archiver bot changed the title [archival placeholder] [PR] Die properly: fast and with no remnants Aug 19, 2020
@kopf-archiver kopf-archiver bot added the bug Something isn't working label Aug 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
archive bug Something isn't working
Projects
None yet
Development

No branches or pull requests

0 participants