-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler plugin to dump cluster state on close #5660
Comments
Actually, a distributed/distributed/diagnostics/plugin.py Lines 55 to 59 in 2a3ee56
We could, however, use a preload, since distributed/distributed/scheduler.py Lines 4155 to 4159 in 2a3ee56
A preload will be a bit less ergonomic to install, though. |
We could also add a |
Yeah, if adding a hook for before workers have been removed would be useful for other problems, then adding a new method seems sensible |
Sounds good. Looking back through blames, it seems that scheduler plugin behavior has always been this way, but without much explanation of why. Having this option would probably be useful in many cases (especially if you're doing something dangerous and have worker plugins doing some stateful thing, and want to communicate to your worker plugins before shutdown). |
Hm, I don't know what I was talking about here.
is only the case if So though I'm not sure if |
We've run into issues where the scheduler unexpectedly cleanly shuts itself down after running for a very long time. Having a dump of cluster state would help to debug this.
After #5659 is implemented, write a
SchedulerPlugin
with aclose
hook that dumps cluster state. The filename where the state is written can be either passed into the plugin instance.If the cluster state dump fails, or writing to the destination fails, this should not affect the shutdown process—just log the problem and move on.
The text was updated successfully, but these errors were encountered: