Skip to content

Initial commit of s3 modular file system plugin (credit @vnvo2409) #1191

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 13, 2020

Conversation

yongtang
Copy link
Member

@yongtang yongtang commented Nov 12, 2020

Initial commit of s3 modular file system plugin (credit @vnvo2409)

This PR is an initial commit to move S3 modular file system plugin to tensorflow-io. @vnvo2409 is the author and this PR is just a mechanic move of code.

There are a several tiny changes

  • At the moment the registration is to s3e://. This is to avoid the collision with TF's s3://. Once we have done all the testing we can remove the TF's registration of s3:// and change registration in tensorflow-io from s3e:// to s3://.
  • Logging is disabled for now. We will reenable once [CherryPick:r2.4] Expose Logging C API in pip package tensorflow#44771 is merged to TF 2.4 release.
  • use_multi_part_download is set to false for now as there is a bug related to multi-part download. When read request is asking for bytes larger than object size. In that case, a status code of 416 is returned. However, 416 is not handled correctly and s3 thought this is an error - should return OUT_OF_RANGE with less bytes.
  • A simple test has been added which is tested against AWS localstack emulator (and it works fine).
  • Note: s3_filesystem_test.cc is not added in this PR yet.

Signed-off-by: Yong Tang yong.tang.github@outlook.com

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
@vnghia
Copy link
Contributor

vnghia commented Nov 13, 2020

Thank you so much for this PR @yongtang !
LGTM.

Note: s3_filesystem_test.cc is not added in this PR yet.

I think it's quite hard to run this test because it depends on many components of TF core. I think we could port the tests to python, it should be easier, WDYT ?

In addition, I saw you have bumped the version of aws-sdk-cpp to 1.7.366. I would like to take a further step, bumping it to 1.8.* which has some new features ( It's offical, not a developer preview anymore ) that are useful for users ( the environement variables for example ). But I do not consider the compatibility between TF core and the plugin yet, WDYT ?

@yongtang
Copy link
Member Author

I think it's quite hard to run this test because it depends on many components of TF core. I think we could port the tests to python, it should be easier, WDYT ?

Yes I think having tests in python would be a lot easier to maintain.

In addition, I saw you have bumped the version of aws-sdk-cpp to 1.7.366. I would like to take a further step, bumping it to 1.8.* which has some new features ( It's offical, not a developer preview anymore ) that are useful for users ( the environement variables for example ). But I do not consider the compatibility between TF core and the plugin yet, WDYT ?

I think that should be possible. I will take a look, likely in a follow up PR to bump the AWS SDK to 1.8.

Copy link
Member

@kvignesh1420 kvignesh1420 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yongtang since we directly porting from TensorFlow, we can merge this as of now. We can make the necessary modifications in different PR's.(if needed)

@yongtang
Copy link
Member Author

Let's merge this PR. Thanks @vnvo2409 @kvignesh1420 !

@vnghia
Copy link
Contributor

vnghia commented Nov 13, 2020

Thank you @yongtang ! I will have some spare time in December and I will get more involved in the migration process.

@yongtang
Copy link
Member Author

Thanks @vnvo2409! I updated the issue in #1183 (comment) with items and follow ups (e.g., update AWS SDK, enable logging, etc).

i-ony pushed a commit to i-ony/io that referenced this pull request Feb 8, 2021
…ensorflow#1191)

* Initial commit of s3 modular file system plugin (credit vnvo2409)

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Disable use_multi_part_download for now, and several fixes

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants