Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when saving TensorFlowModelDataset as partition #759

Open
anabelchuinard opened this issue Aug 21, 2023 · 4 comments
Open

Error when saving TensorFlowModelDataset as partition #759

anabelchuinard opened this issue Aug 21, 2023 · 4 comments
Labels
bug Something isn't working

Comments

@anabelchuinard
Copy link

anabelchuinard commented Aug 21, 2023

Description

Can't save TensorFlowModelDataset objects as partition.

Context

I am dealing with a project where I have to train several models concurrently. I started writing my code using PartitionedDataset where each partition corresponds to the data relative to one training. When I am trying to save the resulting tensorflow models as a partition, I get an error. I wonder is this has to do with the fact that those inherit from the AbstractVersionedDataset instead of the AbstractDataset. And if yes, I am interested to know if there is any workaround for batch saving those.

This is the instance of my catalog corresponding to the models I want to save:

tensorflow_models:
  type: PartitionedDataset
  path: data/derived/ML/models
  filename_suffix: ".hdf5"
  dataset:
    type: kedro.extras.datasets.tensorflow.TensorFlowModelDataset

Note: Saving one model (not as partition) works.

Steps to Reproduce

  1. Generate a bunch of trained models
  2. Try to save them in a partition as TensorFlowModelDataset objects

Expected Result

Should save one .hdf5 file per partition with the name of the file being the associate dictionary key.

Actual Result

Getting this error:

DatasetError: Failed while saving data to data set PartitionedDataset(dataset_config={}, dataset_type=TensorFlowModelDataset,
path=...).
The first argument to `Layer.call` must always be passed.

Your Environment

  • Kedro version used (pip show kedro or kedro -V): kedro, version 0.18.12
  • Python version used (python -V): 3.9.16
  • Operating system and version: Mac M2
@astrojuanlu
Copy link
Member

Hi @anabelchuinard, thanks for opening this issue and sorry for the delay. It will take us some time but I'm labeling this issue so we don't lose track of it.

@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Sep 5, 2023
@merelcht
Copy link
Member

merelcht commented Jul 8, 2024

Hi @anabelchuinard, do you still need help fixing this issue?

@anabelchuinard
Copy link
Author

@merelcht I found a non-kedronic workaround for this but would love to know if there is now a kedronic way for batch-saving those models.

@merelcht
Copy link
Member

merelcht commented Jul 9, 2024

Using the PartitionedDataset is definitely the recommended Kedro way for batch saving. I've done some digging and it seems that the following lines are causing issues for using the TensorFlowModelDataset with PartitionedDataset:

if callable(partition_data):
partition_data = partition_data() # noqa: PLW2901

@merelcht merelcht removed the Community Issue/PR opened by the open-source community label Jul 9, 2024
@merelcht merelcht changed the title Saving TensorFlowModelDataset as partition Error when saving TensorFlowModelDataset as partition Jul 9, 2024
@merelcht merelcht transferred this issue from kedro-org/kedro Jul 9, 2024
@merelcht merelcht added the bug Something isn't working label Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: To Do
Development

No branches or pull requests

3 participants