-
-
Notifications
You must be signed in to change notification settings - Fork 155
Description
This might need to be split into several tickets. I just tried to upgrade to a newer version of dask-kubernetes. If I switch on legacy mode, this seems to work fine. But if I switch to the new mode, where the scheduler runs as a separate pod, I run into several issues, but I might miss something which will resolve all three of these:
A small issue: The scheduler will take the same name as a worker (so you don't know which pod is the scheduler by looking at the name), but worse, it also uses the same resource requests (which it doesn't really need). Also, because the scheduler runs as a separate container, this will be a nightmare when you the client pod is killed/crashes (not terminated), as it won't cleanup anything, and instead of the old situation (workers exciting after 60 seconds) the workers and the scheduler will just stick around forever.
The bigger issue: I can't get it working at all, there are pickle errors when trying to connect to the scheduler both by the worker and the client distributed.protocol.pickle - INFO - Failed to deserialize
, although it seems to be masked by a timeout error.
Is the legacy mode going to disappear in the long run(the name suggests it), or is it safe to keep using it?