-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataCatalog]: Spike - Catalog serialization and deserialization support #3932
Labels
Issue: Feature Request
New feature or improvement to existing feature
Milestone
Comments
ElenaKhaustova
added
the
Issue: Feature Request
New feature or improvement to existing feature
label
Jun 5, 2024
Very similar to |
This was referenced Jun 6, 2024
I like
|
merelcht
changed the title
[DataCatalog]: Catalog serialization and deserialization support
[DataCatalog]: Spike - Catalog serialization and deserialization support
Oct 21, 2024
7 tasks
This was referenced Nov 13, 2024
Open
From the user feedback, we can define three main pain points to address:
The first two pain points can be addressed by:
The third one requires 1 and 2 solved and solving data saving part. The plan for now is to address 1 and 2 first. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
YAML
format and back.DataCatalog
objects when the Kedro version changes when loading, leading to compatibility issues. They require a solution to serialize and deserialize theDataCatalog
object without dependency on Kedro versions.We propose to explore the feasibility of implementing
to_yaml()
andfrom_yaml()
methods for theDataCatalog
object to facilitate serialization and deserialization without dependency on Kedro versions.Context
User feedback:
YAML
function is needed to save modified catalog: "People have always asked for it. Could I have a catalog toYAML
function so that you could actually spit out theYAML
files that are needed to do this again later on?"YAML
and it will all that compilation happens at run time and there's no way for the user to see it."DataCatalog
object they experience difficulties in loading it back if the kedro version is different: "Serialization is an issue because I often pickle a catalog (mostly as part of a mlflow model). Pickling the catalog is really something that leads to a lot of problems because if I don't have the exact same Kedro version when I want to load the catalog, if the object has any change inside - private method or attribute it will lead to error."https://github.com/Galileo-Galilei/kedro-mlflow/blob/64b8e94e1dafa02d979e7753dab9b9dfd4d7341c/kedro_mlflow/mlflow/kedro_pipeline_model.py#L143
"It would be much more robust to be able to do this":
Extra context: #3995 (comment)
The text was updated successfully, but these errors were encountered: