You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BatchedSend has a few edge cases which can silently put it into a broken state. (See my review on #5457 for an enumeration of its issues.) Despite being a core piece of networking infrastructure, it has little internal validation or documentation of its methods' contracts. This means that future contributors can easily make innocuous-looking refactors to code that uses BatchedSend, and introduce edge-case race conditions or deadlocks without receiving any error. This is exactly what happened in #3493: a small change that would have failed simple assertions within BatchedSend introduced deadlocks (#5480) which were not resolved for two years.
We attempted to do this in #6329. However, that proved difficult to merge because BatchedSend is actually used in different ways, with different expectations, on the worker vs scheduler vs client (in large part, because it doesn't have this validation or clear contracts).
With #6350, the primary edge-case use of BatchedSend has been eliminated (restarting an instance with a new comm object). It should now be pretty easy to add some internal state validation and documentation to BatchedSend, to prevent problems like this in the future. In the process, we can also modernize convert Tornado and gen_coroutine uses to asyncio.
The text was updated successfully, but these errors were encountered:
FYI, a year ago I worked on switching BatchedSend to asyncio. It's pretty simple, but I also think it didn't quite work—there's a reason I didn't open a PR (possibly something around calling start from inside or outside the event loop?). Now that usage has changed with #6350, maybe this will be easier now.
BatchedSend
has a few edge cases which can silently put it into a broken state. (See my review on #5457 for an enumeration of its issues.) Despite being a core piece of networking infrastructure, it has little internal validation or documentation of its methods' contracts. This means that future contributors can easily make innocuous-looking refactors to code that usesBatchedSend
, and introduce edge-case race conditions or deadlocks without receiving any error. This is exactly what happened in #3493: a small change that would have failed simple assertions withinBatchedSend
introduced deadlocks (#5480) which were not resolved for two years.We attempted to do this in #6329. However, that proved difficult to merge because
BatchedSend
is actually used in different ways, with different expectations, on the worker vs scheduler vs client (in large part, because it doesn't have this validation or clear contracts).With #6350, the primary edge-case use of
BatchedSend
has been eliminated (restarting an instance with a new comm object). It should now be pretty easy to add some internal state validation and documentation toBatchedSend
, to prevent problems like this in the future. In the process, we can also modernize convert Tornado andgen_coroutine
uses to asyncio.The text was updated successfully, but these errors were encountered: