Description
As was discussed in our September monthly meetings, we are looking into migrate modular file system plugins in tensorflow (s3/hdfs/gcs) to tensorflow-io package.
Once the migration is done the size and build time of tensorflow repo could be substantially improved, this can greatly help external tensorflow contributors experience as they could selectively decide to only build some components (e.g., tensorflow core, tensorflow-io, tensorflow-addons, etc).
Currently it takes at least 8 hours (or even more) for many external tensorflow contributors to build complete tensorflow package due to the lack of bazel cache that is only internally available in google.
Now as we already have a couple of modular file systems in tensorflow-io (azure blob file system and http file system. see #1111), it might be time to start looking into migration of plugins in tensorflow to tensorflow-io.
/cc @mihaimaruseac @vnvo2409
Item list:
- S3 file system (PR Initial commit of s3 modular file system plugin (credit @vnvo2409) #1191)
- GCS file system (PR Initial commit of gcs modular file system plugin (vnvo2409) #1203)
- HDFS file system (PR Initial commit of hadoop file system (credit vnvo2409) #1197)
Follow up:
- [S3 file system] Port
s3_filesystem_test.cc
to python tests (Initial commit of s3 modular file system plugin (credit @vnvo2409) #1191 (comment)). - [S3 file system] Bump AWS SDK to 1.8.x (Initial commit of s3 modular file system plugin (credit @vnvo2409) #1191 (comment))
- [S3 file system] Fix multi download issue (Initial commit of s3 modular file system plugin (credit @vnvo2409) #1191 (comment))
- [S3 file system] Enable logging through TensorFlow C API (Initial commit of s3 modular file system plugin (credit @vnvo2409) #1191 (comment))
- [HDFS file system] Port
hadoop_filesystem_test.cc
to python tests - [GCS file system] Port
gcs_filesystem_test.cc
to python tests