-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
is it recommended to pass intermediate ML object between nodes? #505
Comments
Hi @miyamonz Kedro provides a number of built-in datasets and we (and contributors) keep adding new datasets for handling other data formats, including What you can do is to add a custom dataset (see the details for how to implement it in https://kedro.readthedocs.io/en/stable/07_extend_kedro/01_custom_datasets.html#custom-datasets), similar to TensorFlowDataSet (you can find the source code for TensorFlowDataSet in https://github.com/quantumblacklabs/kedro/blob/master/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py). Or if PyTorch objects are pickable, you could use And we are more than welcome your contribution for a new dataset into Kedro :) Hope this helps. Please let me know if you have any questions. |
Kedro's pipeline does not support cyclic dependency, so it might be tough to use for a repetitive process such as multiple epochs of training neural network models. To use PyTorch Ignite with Kedro, I developed a wrapper (declarative high-level API) of PyTorch Ignite and open-sourced as part of my PipelineX package: Here is an example project to use the wrapper of PyTorch Ignite: PyTorch Lightning provides a high-level API called "Trainer", so this could be used with Kedro. |
Thanks for your answers! If so, I found a difficult point for a beginner who faces a similar situation and I want to let you know about it. When I pass intermediate objects such as optimizer between nodes without any config, the objects are default MemoryDataSet with deep copy mode. There is no wrong method call or type error between nodes because MemoryDataSet just copied the object. when I saw the word optimizers:
type: MemoryDataSet
copy_mode: assign so I think this behavior should be documented. I've realized since I wrote this, it already documented here. |
Yes, it's important. and I try to see PipelineX. thanks! |
I believe the original question has been solved. I'm closing this but feel free to reopen it or open a new issue (or post a question in StackOverflow https://stackoverflow.com/questions/tagged/kedro), if you have any follow-up questions :) |
What are you trying to do?
I'm trying to use Kedro as an ML framework.
For example, pytorch-ignite or pytoch-lightning and so on are famous.
and I want to use Kedro for such a purpose.
like this:
this pipeline is for fine-tuning pretrained model. and you can see this pipeline pass optimizer objects as dataset.
the
get optimizers
node receivespretrained model
object and makesoptimizers
dataset that contains optimizer and scheduler.That is, I'm passing pytorch objects as Kedro's DataSet.
"intermediate" that I wrote on this title means such objects.
But I can't find these use cases by searching on this Github repo and the Internet.
it seems that Kedro pipelines and nodes basically handle data that can convert pd.DataFrame or CSV something, or final result of the ML model to save.
so, I want to know whether this use case is good or bad for Kedro's contributors.
or someone already does like this or know this use case, please let me know.
The text was updated successfully, but these errors were encountered: