-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly support restarting BatchedSend
#5481
Comments
How would that look like and what value would this deliver? So far, the restart by reconnect starts the BatchedComm again and the BatchedComm takes care of the rest. I'm wondering what value a "clear" interface would yield. It feels like every change to this interface would make the code using it more complex.
I personally consider the tests added in #5457 to be sufficient and if something is missing, let's discuss it over there.
what is missing? |
Just for historical understanding—I believe this was sort of introduced in #3493. Before that, we were always creating a new distributed/distributed/worker.py Lines 866 to 867 in 2a05299
Basic |
In #5480 we found that when
Worker.batched_stream
is restarted after a broken connection to the scheduler, it enters a broken state wheresend
succeeds, but doesn't actually send data (it just sits in the buffer forever).This is probably fixed by #5457, but I think it may deserve a more thorough fix. If we want
BatchedSend
to be restartable, it should have a clear interface and tests for this.xref #4133 #4163 #5377
The text was updated successfully, but these errors were encountered: