Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KED-2140] Fix issue with saving versioned HDF5 models. #519

Merged
merged 3 commits into from
Oct 19, 2020

Conversation

djpetti
Copy link
Contributor

@djpetti djpetti commented Sep 19, 2020

Description
Fixes #518

Development notes

Changed extras.tensorflow.TensorFlowModelDataset. I added a few lines to the _save() method. I also added a unit test that tests for this issue.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change and added my name to the list of supporting contributions in the RELEASE.md file
  • Added tests to cover my changes

Notice

  • I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":

  • I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.

  • I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.

  • I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.

@921kiyo 921kiyo changed the title Fix issue with saving versioned HDF5 models. [KED-2140] Fix issue with saving versioned HDF5 models. Oct 2, 2020
Comment on lines +154 to +156
# Make sure all intermediate directories are created.
save_dir = Path(save_path).parent
save_dir.mkdir(parents=True, exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised line 115 isn't enough here. 🤔 Do you know more about why that is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into fsspec, it looks like this is a peculiarity with how that library works. Specifically, setting auto_mkdir=True will automatically create the directory, but not immediately. The directory will be created lazily when the _open() method gets called. However, AFAIK, TensorflowModelDataset._save() never calls _open(). For HDF5 files, it eventually calls copy(), but in fsspec, this merely delegates to shutil.copy() without calling _open(). Thus, fsspec fails to create the parent directory automatically, although, arguably, it should.

If you think that this issue would be better fixed in fsspec, or that the code here should include a TODO explaining the situation, I would be amenable to making those alterations.

Incidentally, it looks like this issue is resolved in the current master branch of fsspec. Maybe the proper resolution here is just to update fsspec?

Copy link
Contributor

@lorenabalan lorenabalan Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's interesting, thank you for the very thorough research!!
Is that a released fsspec version? We have a ticket to further relax the fsspec boundaries to include 0.8.x (0.7.x was a breaking change for us so that bump was done on develop). Do you have time to look into whether that would also fix this issue, and if so we can merge a PR with that update against develop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A cursory glance at the fsspec code indicates that the issue is fixed in 0.8.0. It seems, then, that merging the existing PR would indeed fix this issue also. If you agree, I will go and test locally to make sure that the issue is indeed resolved.

Copy link
Contributor

@lorenabalan lorenabalan Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case what I'm thinking is:

  • Because we are planning on releasing 0.16.6 soon, merge this PR into master as is, so users can benefit from the bug fix without having to go through the pain of migrating to 0.17.
  • Open another PR on develop where we update fsspec>=0.5.1,<0.9 and remove these 2 lines from the TensorFlowModelDataset (but keep the unit test because it's useful).

@limdauto @djpetti how does that sound?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Should I be handling the second PR too or will someone else do that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd like to take that on that would be super helpful! 😊 If not no worries someone in the team can do it.

Copy link
Contributor Author

@djpetti djpetti Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can definitely take a crack at this. I might not get to it until tomorrow though.

On a related note, do you have a link to the specific issue for relaxing the fsspec version? I would like to be able to reference it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely up for merging this into master as-is for 0.16.6. Then we can open another PR agaisnt develop. I think you can create another PR and reference the discussion in this PR. No need to create another issue for it. WDYT?

Copy link
Contributor Author

@djpetti djpetti Oct 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I am comfortable merging as-is.

Copy link
Contributor

@limdauto limdauto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

@limdauto limdauto merged commit 4aa8337 into kedro-org:master Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[KED-2140]TensorflowModelDataset save fails with hdf5 model when versioning is enabled.
3 participants