-
We know Dagster introduced IO Manager in 0.10.0 release. Is it possible to set IO managers at the DagsterType level as well? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
It is important to vary behavior both by type and by mode. For example, to handle a Spark data frame, we sometimes need to load from the local file system, sometimes from the remote system. Varying purely on type was overly restrictive. The pattern we encourage people to experience is to build IO managers that varying behavior based on the Dagster type that is on the input and output. When IO manager is making decision about how to load or store output, it has access to the Dagster type on the input or the output defintion. It can vary how it loads based on that. The other approach is to set the IO manager keys on the output definitions based on the type of the output definition. For example, if you are dealing with Spark data, you may set a Spark IO manager key to put all the Spark outputs inside the S3 store system. If you are dealing with machine learning model type, you may use an IO manager that stores all your machine learning model in a ML model store. |
Beta Was this translation helpful? Give feedback.
-
Hi @nancydyc thanks for the clarification on this! I also asked a similar question in the community meeting and it makes sense. I have actually implemented a IOManager that works differently based on the DagsterType as you suggest and it works great. Now that i'm starting to play with the new version of dagster though, i'm noticing a large overlap between what RootInputManager/IOManager does and what type_loader and materializers do (they both seem like a special case of RootInputManager / IOManager where the DagsterType is fixed). Do you think it would make sense to replace type loaders and materializers with a "default" parameter for IOManager and RootInputManager? Thanks again and congratulations on this amazing release, |
Beta Was this translation helpful? Give feedback.
It is important to vary behavior both by type and by mode. For example, to handle a Spark data frame, we sometimes need to load from the local file system, sometimes from the remote system. Varying purely on type was overly restrictive.
The pattern we encourage people to experience is to build IO managers that varying behavior based on the Dagster type that is on the input and output.
When IO manager is making decision about how to load or store output, it has access to the Dagster type on the input or the output defintion. It can vary how it loads based on that.
The other approach is to set the IO manager keys on the output definitions based on the type of the output definition. For exam…