Separate parallel backend from frontend #198

CJ-Wright · 2018-08-12T15:13:21Z

Recently I've been trying to split my pipelines up into chunks so the chunks can be re-used as much as possible. One issue I foresee running into is that scatter may not work as intended. scatter will send the data to the cluser for processing, but since the downstream nodes are already defined (I'm linking them via connect) they won't properly operate on the data (since they aren't DaskStream nodes).

It seems that it may be beneficial to separate the front end pipeline definition from the decision about which backend is being used, at least at the class level. This would make pipelines more modular since we don't need to re-write the same pipeline for each backend. Additionally this would allow the user executing a pipeline to decide what backend they want/are able to use.

One approach to handle this could be to specify a client for the entire pipeline, where a client mimics some of the dask client API (mostly submit and loop I think). If the client associated with any node is changed/updated then the entire pipeline shifts to use that client. This would also require nodes to know if they are downstream of a scatter but haven't been gathered yet, since then they would know to use the pipeline's client rather than standard local execution (which most likely should be a dummy client). Maybe this can be inspected when the client is set, since it could change during the pipeline build process?

The text was updated successfully, but these errors were encountered:

martindurant · 2019-05-01T15:43:35Z

I believe you had a simpler implementation of this idea that what is being described above? Having scatter/gather not use dask explicitly seems like an OK ting to want to me, but personally I also think there's a lot to be gained from the extra knowledge we have about the execution framework.

CJ-Wright · 2019-08-31T18:15:17Z

Can you elaborate on what is gained from the knowledge of the framework?
I do have a working implementation of this which I'll try to PR soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separate parallel backend from frontend #198

Separate parallel backend from frontend #198

CJ-Wright commented Aug 12, 2018

martindurant commented May 1, 2019

CJ-Wright commented Aug 31, 2019

Separate parallel backend from frontend #198

Separate parallel backend from frontend #198

Comments

CJ-Wright commented Aug 12, 2018

martindurant commented May 1, 2019

CJ-Wright commented Aug 31, 2019