Description
BatchedSend
has a few edge cases which can silently put it into a broken state. (See my review on #5457 for an enumeration of its issues.) Despite being a core piece of networking infrastructure, it has little internal validation or documentation of its methods' contracts. This means that future contributors can easily make innocuous-looking refactors to code that uses BatchedSend
, and introduce edge-case race conditions or deadlocks without receiving any error. This is exactly what happened in #3493: a small change that would have failed simple assertions within BatchedSend
introduced deadlocks (#5480) which were not resolved for two years.
We attempted to do this in #6329. However, that proved difficult to merge because BatchedSend
is actually used in different ways, with different expectations, on the worker vs scheduler vs client (in large part, because it doesn't have this validation or clear contracts).
With #6350, the primary edge-case use of BatchedSend
has been eliminated (restarting an instance with a new comm object). It should now be pretty easy to add some internal state validation and documentation to BatchedSend
, to prevent problems like this in the future. In the process, we can also modernize convert Tornado and gen_coroutine
uses to asyncio.