Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2140] Fix issue with saving versioned HDF5 models. #519

Merged
merged 3 commits into from
Oct 19, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,12 @@
* Fixed `kedro install` for an Anaconda environment defined in `environment.yml`.
* Fixed backwards compatibility with templates generated with older Kedro versions <0.16.5. No longer need to update `.kedro.yml` to use `kedro lint` and `kedro jupyter notebook convert`.
* Improved documentation.
* Fixed issue with saving a `TensorFlowModelDataset` in the HDF5 format with versioning enabled.

## Breaking changes to the API

## Thanks for supporting contributions
[Deepyaman Datta](https://github.com/deepyaman), [Bhavya Merchant](https://github.com/bnmerchant), [Lovkush Agarwal](https://github.com/Lovkush-A)
[Deepyaman Datta](https://github.com/deepyaman), [Bhavya Merchant](https://github.com/bnmerchant), [Lovkush Agarwal](https://github.com/Lovkush-A), [Daniel Petti](https://github.com/djpetti)

# Release 0.16.5

Expand Down
6 changes: 5 additions & 1 deletion kedro/extras/datasets/tensorflow/tensorflow_model_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
"""
import copy
import tempfile
from pathlib import PurePath, PurePosixPath
from pathlib import Path, PurePath, PurePosixPath
from typing import Any, Dict

import fsspec
Expand Down Expand Up @@ -151,6 +151,10 @@ def _load(self) -> tf.keras.Model:
def _save(self, data: tf.keras.Model) -> None:
save_path = get_filepath_str(self._get_save_path(), self._protocol)

# Make sure all intermediate directories are created.
save_dir = Path(save_path).parent
save_dir.mkdir(parents=True, exist_ok=True)
Comment on lines +154 to +156
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised line 115 isn't enough here. 🤔 Do you know more about why that is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into fsspec, it looks like this is a peculiarity with how that library works. Specifically, setting auto_mkdir=True will automatically create the directory, but not immediately. The directory will be created lazily when the _open() method gets called. However, AFAIK, TensorflowModelDataset._save() never calls _open(). For HDF5 files, it eventually calls copy(), but in fsspec, this merely delegates to shutil.copy() without calling _open(). Thus, fsspec fails to create the parent directory automatically, although, arguably, it should.

If you think that this issue would be better fixed in fsspec, or that the code here should include a TODO explaining the situation, I would be amenable to making those alterations.

Incidentally, it looks like this issue is resolved in the current master branch of fsspec. Maybe the proper resolution here is just to update fsspec?

Copy link
Contributor

@lorenabalan lorenabalan Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's interesting, thank you for the very thorough research!!
Is that a released fsspec version? We have a ticket to further relax the fsspec boundaries to include 0.8.x (0.7.x was a breaking change for us so that bump was done on develop). Do you have time to look into whether that would also fix this issue, and if so we can merge a PR with that update against develop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cursory glance at the fsspec code indicates that the issue is fixed in 0.8.0. It seems, then, that merging the existing PR would indeed fix this issue also. If you agree, I will go and test locally to make sure that the issue is indeed resolved.

Copy link
Contributor

@lorenabalan lorenabalan Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case what I'm thinking is:

  • Because we are planning on releasing 0.16.6 soon, merge this PR into master as is, so users can benefit from the bug fix without having to go through the pain of migrating to 0.17.
  • Open another PR on develop where we update fsspec>=0.5.1,<0.9 and remove these 2 lines from the TensorFlowModelDataset (but keep the unit test because it's useful).

@limdauto @djpetti how does that sound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Should I be handling the second PR too or will someone else do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to take that on that would be super helpful! 😊 If not no worries someone in the team can do it.

Copy link
Contributor Author

@djpetti djpetti Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely take a crack at this. I might not get to it until tomorrow though.

On a related note, do you have a link to the specific issue for relaxing the fsspec version? I would like to be able to reference it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely up for merging this into master as-is for 0.16.6. Then we can open another PR agaisnt develop. I think you can create another PR and reference the discussion in this PR. No need to create another issue for it. WDYT?

Copy link
Contributor Author

@djpetti djpetti Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I am comfortable merging as-is.


with tempfile.TemporaryDirectory(prefix=self._tmp_prefix) as path:
if self._is_h5:
path = str(PurePath(path) / TEMPORARY_H5_FILE)
Expand Down
24 changes: 24 additions & 0 deletions tests/extras/datasets/tensorflow/test_tensorflow_model_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,30 @@ def test_save_and_load(
new_predictions = reloaded.predict(dummy_x_test)
np.testing.assert_allclose(predictions, new_predictions, rtol=1e-6, atol=1e-6)

def test_hdf5_save_format(
self,
dummy_tf_base_model,
dummy_x_test,
filepath,
tensorflow_model_dataset,
load_version,
save_version,
):
"""Test versioned TensorflowModelDataset can save TF graph models in
HDF5 format"""
hdf5_dataset = tensorflow_model_dataset(
filepath=filepath,
save_args={"save_format": "h5"},
version=Version(load_version, save_version),
)

predictions = dummy_tf_base_model.predict(dummy_x_test)
hdf5_dataset.save(dummy_tf_base_model)

reloaded = hdf5_dataset.load()
new_predictions = reloaded.predict(dummy_x_test)
np.testing.assert_allclose(predictions, new_predictions, rtol=1e-6, atol=1e-6)

def test_prevent_overwrite(self, dummy_tf_base_model, versioned_tf_model_dataset):
"""Check the error when attempting to override the data set if the
corresponding file for a given save version already exists."""
Expand Down