Description
We're already pushing some extra work onto the worker nodes (mainly argument fetching and processor selection/load balancing), and it would be beneficial for large DAGs to move more work onto each worker. The main blocker is providing a way to split the DAG into multiple domains, where each domain is handled by a given thread on a given worker. With efficient Thunk serialization, we can then send a subgraph to each worker and let them process their own DAG without conflicts. We'll need to add a mechanism by which thunks automatically wait on their input thunks to complete before they attempt to download the output data; if possible, we can also have workers broadcast and shard their Chunk
s onto dependent workers as soon as the data is made available.