-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KED-2140] Fix issue with saving versioned HDF5 models. #519
[KED-2140] Fix issue with saving versioned HDF5 models. #519
Conversation
# Make sure all intermediate directories are created. | ||
save_dir = Path(save_path).parent | ||
save_dir.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised line 115 isn't enough here. 🤔 Do you know more about why that is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Digging into fsspec
, it looks like this is a peculiarity with how that library works. Specifically, setting auto_mkdir=True
will automatically create the directory, but not immediately. The directory will be created lazily when the _open()
method gets called. However, AFAIK, TensorflowModelDataset._save()
never calls _open()
. For HDF5 files, it eventually calls copy()
, but in fsspec
, this merely delegates to shutil.copy()
without calling _open()
. Thus, fsspec
fails to create the parent directory automatically, although, arguably, it should.
If you think that this issue would be better fixed in fsspec
, or that the code here should include a TODO
explaining the situation, I would be amenable to making those alterations.
Incidentally, it looks like this issue is resolved in the current master
branch of fsspec
. Maybe the proper resolution here is just to update fsspec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that's interesting, thank you for the very thorough research!!
Is that a released fsspec
version? We have a ticket to further relax the fsspec
boundaries to include 0.8.x (0.7.x was a breaking change for us so that bump was done on develop
). Do you have time to look into whether that would also fix this issue, and if so we can merge a PR with that update against develop
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A cursory glance at the fsspec
code indicates that the issue is fixed in 0.8.0
. It seems, then, that merging the existing PR would indeed fix this issue also. If you agree, I will go and test locally to make sure that the issue is indeed resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case what I'm thinking is:
- Because we are planning on releasing 0.16.6 soon, merge this PR into master as is, so users can benefit from the bug fix without having to go through the pain of migrating to 0.17.
- Open another PR on
develop
where we updatefsspec>=0.5.1,<0.9
and remove these 2 lines from theTensorFlowModelDataset
(but keep the unit test because it's useful).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Should I be handling the second PR too or will someone else do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you'd like to take that on that would be super helpful! 😊 If not no worries someone in the team can do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can definitely take a crack at this. I might not get to it until tomorrow though.
On a related note, do you have a link to the specific issue for relaxing the fsspec
version? I would like to be able to reference it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely up for merging this into master as-is for 0.16.6
. Then we can open another PR agaisnt develop
. I think you can create another PR and reference the discussion in this PR. No need to create another issue for it. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I am comfortable merging as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you
Description
Fixes #518
Development notes
Changed
extras.tensorflow.TensorFlowModelDataset
. I added a few lines to the_save()
method. I also added a unit test that tests for this issue.Checklist
RELEASE.md
fileNotice
I acknowledge and agree that, by checking this box and clicking "Submit Pull Request":
I submit this contribution under the Apache 2.0 license and represent that I am entitled to do so on behalf of myself, my employer, or relevant third parties, as applicable.
I certify that (a) this contribution is my original creation and / or (b) to the extent it is not my original creation, I am authorised to submit this contribution on behalf of the original creator(s) or their licensees.
I certify that the use of this contribution as authorised by the Apache 2.0 license does not violate the intellectual property rights of anyone else.