Is it possible to set IO managers at the DagsterType level as well? #3554

nancydyc · 2021-01-18T23:07:22Z

nancydyc
Jan 18, 2021

We know Dagster introduced IO Manager in 0.10.0 release. Is it possible to set IO managers at the DagsterType level as well?

Answered by nancydyc

Jan 18, 2021

It is important to vary behavior both by type and by mode. For example, to handle a Spark data frame, we sometimes need to load from the local file system, sometimes from the remote system. Varying purely on type was overly restrictive.

The pattern we encourage people to experience is to build IO managers that varying behavior based on the Dagster type that is on the input and output.

When IO manager is making decision about how to load or store output, it has access to the Dagster type on the input or the output defintion. It can vary how it loads based on that.

The other approach is to set the IO manager keys on the output definitions based on the type of the output definition. For exam…

View full answer

nancydyc · 2021-01-18T23:12:02Z

nancydyc
Jan 18, 2021
Author

It is important to vary behavior both by type and by mode. For example, to handle a Spark data frame, we sometimes need to load from the local file system, sometimes from the remote system. Varying purely on type was overly restrictive.

The pattern we encourage people to experience is to build IO managers that varying behavior based on the Dagster type that is on the input and output.

When IO manager is making decision about how to load or store output, it has access to the Dagster type on the input or the output defintion. It can vary how it loads based on that.

The other approach is to set the IO manager keys on the output definitions based on the type of the output definition. For example, if you are dealing with Spark data, you may set a Spark IO manager key to put all the Spark outputs inside the S3 store system. If you are dealing with machine learning model type, you may use an IO manager that stores all your machine learning model in a ML model store.

0 replies

amarrella · 2021-01-22T14:50:20Z

amarrella
Jan 22, 2021

Hi @nancydyc thanks for the clarification on this!

I also asked a similar question in the community meeting and it makes sense. I have actually implemented a IOManager that works differently based on the DagsterType as you suggest and it works great.

Now that i'm starting to play with the new version of dagster though, i'm noticing a large overlap between what RootInputManager/IOManager does and what type_loader and materializers do (they both seem like a special case of RootInputManager / IOManager where the DagsterType is fixed).

Do you think it would make sense to replace type loaders and materializers with a "default" parameter for IOManager and RootInputManager?

Thanks again and congratulations on this amazing release,
Alessandro

3 replies

alangenfeld Jan 22, 2021
Maintainer

Very astute observation! We actually spent the last few days before the release trying to reconcile all this but there are some tricky aspects and there was too little time left to get it all figured out.

Do you think it would make sense to replace type loaders and materializers with a "default" parameter for IOManager and RootInputManager?

Can you expand on what exactly you mean here? I'm not sure I'm interpreting it correctly.

Some comments that may be relevant:
type_loader is effectively replaced by RootInputManager for any InputDefinition where you set the root_manager_key. You could write a function that wrapped InputDefinition creation if you wanted to make it manageable to set the key on all your inputs. OutputDefinition has a default io_manager_key of io_manager, so you can set a default IOManager by setting that key in your resources in your ModeDefinition.

@sryza can likely add to the discussion

amarrella Jan 22, 2021

Hi @alangenfeld,

I mean at the DagsterType level, if we had something like default_io_manager_key and default_root_input_manager_key these would be selected when there is no io_manager specified in the output or when there is no root_input_manager specified in the input.

For OutputDefinition, instead of setting the default key to "io_manager" it would be set to the default for the type if it exists, otherwise "io_manager".

The reason for this is that i'm writing an internal library with useful DagsterTypes and default IOManagers and RootInputManagers, so I'd like to indicate a "default" one for a type if it makes sense.

Your proposal of wrapping InputDefinition and OutputDefinition also works and I'm thinking about that but I wanted to see if this could be done more elegantly upstream first.

Thanks again!
Alessandro

alangenfeld Jan 22, 2021
Maintainer

Ah, interesting idea. We will take that in to consideration as we look towards improving this. Thanks for taking the time to write this up!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to set IO managers at the DagsterType level as well? #3554

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Is it possible to set IO managers at the DagsterType level as well? #3554

nancydyc Jan 18, 2021

Replies: 2 comments · 3 replies

nancydyc Jan 18, 2021 Author

amarrella Jan 22, 2021

alangenfeld Jan 22, 2021 Maintainer

amarrella Jan 22, 2021

alangenfeld Jan 22, 2021 Maintainer

nancydyc
Jan 18, 2021

Replies: 2 comments 3 replies

nancydyc
Jan 18, 2021
Author

amarrella
Jan 22, 2021

alangenfeld Jan 22, 2021
Maintainer

alangenfeld Jan 22, 2021
Maintainer