Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2140]TensorflowModelDataset save fails with hdf5 model when versioning is enabled. #518

Closed
djpetti opened this issue Sep 19, 2020 · 1 comment · Fixed by #519
Closed
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@djpetti
Copy link
Contributor

djpetti commented Sep 19, 2020

Description

The title is pretty self-explanatory.

Context

By default, TensorFlowModelDataset saves a model using TensorFlow's native format. This works as expected. Saving the model as an HDF5 file also works as expected, provided you don't have versioning enabled.

Steps to Reproduce

This failure happens with a catalog configuration like the following:

trained_model:
  type: tensorflow.TensorFlowModelDataset
  filepath: data/06_models/fully_trained.hd5
  save_args:
    save_format: h5
  versioned: True

Expected Result

The model should be saved successfully as a versioned DataSet.

Actual Result

You get an error like the following:

Traceback (most recent call last):
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/io/core.py", line 240, in save
    self._save(data)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py", line 167, in _save
    self._fs.copy(path, save_path)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/fsspec/implementations/local.py", line 90, in copy
    shutil.copyfile(path1, path2)
  File "/home/daniel/.pyenv/versions/3.7.7/lib/python3.7/shutil.py", line 121, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/home/daniel/git/cotton_counter/data/06_models/fully_trained.hd5/2020-09-19T16.20.54.312Z/fully_trained.hd5'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/daniel/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/daniel/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/__main__.py", line 38, in <module>
    main()
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/framework/cli/cli.py", line 724, in main
    cli_collection()
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/home/daniel/git/cotton_counter/kedro_cli.py", line 263, in run
    pipeline_name=pipeline,
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/framework/context/context.py", line 767, in run
    raise exc
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/framework/context/context.py", line 759, in run
    run_result = runner.run(filtered_pipeline, catalog, run_id)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/runner/runner.py", line 101, in run
    self._run(pipeline, catalog, run_id)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/runner/sequential_runner.py", line 90, in _run
    run_node(node, catalog, self._is_async, run_id)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/runner/runner.py", line 213, in run_node
    node = _run_node_sequential(node, catalog, run_id)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/runner/runner.py", line 249, in _run_node_sequential
    catalog.save(name, data)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/io/data_catalog.py", line 439, in save
    func(data)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/io/core.py", line 625, in save
    super().save(data)
  File "/home/daniel/git/cotton_counter/.venv/lib/python3.7/site-packages/kedro/io/core.py", line 247, in save
    raise DataSetError(message) from exc
kedro.io.core.DataSetError: Failed while saving data to data set TensorFlowModelDataset(filepath=/home/daniel/git/cotton_counter/data/06_models/fully_trained.hd5, load_args={'compile': False}, protocol=file, save_args={'save_format': h5}, version=Version(load=None, save='2020-09-19T16.20.54.312Z')).
[Errno 2] No such file or directory: '/home/daniel/git/cotton_counter/data/06_models/fully_trained.hd5/2020-09-19T16.20.54.312Z/fully_trained.hd5'

Digging deeper, it appears that this issue is caused by TensorFlowModelDataset not properly checking to make sure that all intermediate directories are created when saving the model. I was able to fix it by adding two lines to the _save() method:

    def _save(self, data: tf.keras.Model) -> None:
        save_path = get_filepath_str(self._get_save_path(), self._protocol)

        # New lines are here.
        save_dir = Path(save_path).parent
        save_dir.mkdir(parents=True, exist_ok=True)

I can submit this as a PR also.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.16.5
  • Python version used (python -V): 3.7.7
  • Operating system and version: Ubuntu 20.04
@djpetti djpetti added the Issue: Bug Report 🐞 Bug that needs to be fixed label Sep 19, 2020
djpetti added a commit to djpetti/kedro that referenced this issue Sep 19, 2020
@lorenabalan lorenabalan changed the title TensorflowModelDataset save fails with hdf5 model when versioning is enabled. [KED-2140]TensorflowModelDataset save fails with hdf5 model when versioning is enabled. Oct 6, 2020
@samirsaliba
Copy link

samirsaliba commented Feb 22, 2021

Hello! I was having the same error ([Errno 2] No such file or directory) with a custom json dataset that I had created and only could fix the problem with the lines mentioned by @djpetti (Thank you).

 # New lines are here.
        save_dir = Path(save_path).parent
        save_dir.mkdir(parents=True, exist_ok=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants