-
-
Notifications
You must be signed in to change notification settings - Fork 750
Description
In #5520 and #5976 and #6007 we've started a shuffle service. This has better memory characteristics, but is not resilient. In particular, it can break in a few ways:
- A worker holding shuffle outputs can die mid-shuffle
- New outputs of a shuffle can be requested by a client after the shuffle has started
- Output futures of a shuffle can be unrequested by a client
There are a few ways to solve this problem. One way I'd like to discuss here is opening up scheduler events to extensions, and letting them trigger transitions. In particular both scenarios 1 and 2 can be handled by letting the extension track remove_worker and update_graph events and restart all shuffle tasks if an output-holding worker dies, or if any of the existing shuffles occur in a new graph. Scenario 3 can be handled by letting the extension track transition events, and clean things up when the barrier task transitions out of memory.
So far, I think that this can solve all of the resilience issues in shuffling (at least everything I've come across). However, it introduces two possible concerns:
1 - Scheduler performance
Maybe it doesn't make sense for every transition to cycle through every extension to see if they care about transitions.
This doesn't seem to be that expensive in reality
In [1]: extensions = [object() for _ in range(10)]
In [2]: %%timeit
...: for i in range(1000):
...: for extension in extensions:
...: if hasattr(extension, "transition"):
...: extension.transiiton()
...:
383 µs ± 7.07 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)So around 40ns per extension per transition. It's well under a microsecond for all of our extensions.
2 - Complexity
Now any extension can inject transitions. Horrible horrible freedom!
My guess is that this is ok as long as we maintain some hygiene around, for example, always using the trasitions system, rather than mucking about with state directly
This is also a somewhat systemic change for what is, today, a single feature.
cc @gjoseph92 @fjetter for feedback