gfile: add support for fsspec filesystems #5248

d4l3k · 2021-08-17T00:43:22Z

Motivation for features / changes

This adds support for fsspec filesystem backends to the gfile compat library. This allows for tensorboard to be used with a wider range of file system backends.

See: #5165

Technical description of changes

This adds a new GFile filesystem backend that's used if no other registered filesystems are present. In that case fsspec is checked to see if it has an available one and then returns a wrapped version.

fsspec provides similar semantics to normal usage however, it doesn't support seeking for files opened in text mode which can be problematic for longer text files. However, this seems to work regardless since most of tensorboards larger files are in a binary format.

Detailed steps to verify changes work correctly (as executed by you)

I've copied the existing local filesystem unit tests and changed the paths to be prefixed with file:// so they go through the fsspec backend.

$ bazel run //tensorboard -- --logdir file:///tmp/torchxrun/lightning_logs/version_1/
$ bazel test //tensorboard/compat/tensorflow_stub:gfile_fsspec_test //tensorboard/summary/writer:event_file_writer_fsspec_test --test_output=errors

Reubend · 2021-08-18T19:52:21Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+    @_translate_errors
+    def join(self, path, *paths):
+        """Join paths with a slash."""
+        return os.path.join(path, *paths)


Seems like fsspec doesn't have a join function, so this might be okay. But using the OS' path joiner seems a little risky to me.... I remember Windows has some weirdnesses with backwards slashes.

Maybe you can build a simple one using the pathsep of the filesystem class in fsspec? That won't cover relative -> absolute path resolution but it should theoretically be more compatible with the remote filesystems on a variety of OSs.

changed to use url_to_fs and fs.sep

Reubend · 2021-08-18T19:56:35Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        with fs.open(path, mode, encoding=encoding) as f:
+            if offset is not None:
+                f.seek(offset)


Is it ever possible for this to throw a 'not implemented' error? You might want to check seekable() before you attempt to seek, in case of user error when they request an offset but the fs doesn't support it.

I do the check for this a few lines up though using seekable is probably the better approach here

Reubend · 2021-08-18T20:03:01Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        if isinstance(filename, bytes):
+            filename = filename.decode("utf-8")
+        fs, _, _ = fsspec.get_fs_token_paths(
+            filename,
+        )


Quick question: are you repeating the logic from _fs_path here to avoid running the validation at the end of that function?

yeah, get_fs_token_paths expands the filename globs and returns the files. I should probably change this to just grab the filesystem and not do any of the filepath resolution

changed to url_to_fs

Reubend · 2021-08-18T20:19:02Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        SEPARATOR = "://"
+
+        prefixed_files = []
+        if SEPARATOR in filename:
+            for file in files:
+                if SEPARATOR not in file:
+                    file = fs.protocol + SEPARATOR + file
+                prefixed_files.append(file)
+        return prefixed_files


I'm not certain this will work in all cases... there are chained systems (ex: dask::s3://bucket/key) which fsspec supports, local systems where paths don't have a protocol (ex: C:\Users\reubend\file.txt) and a few other edge cases.

Have you tested this with those? My initial guess is that putting the protocol and then the separator might break those cases.

updated the diff to handle chained file systems and added test cases for them

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

d4l3k

updated after feedback + now supports chained file systems such as simplecache::zip://foo.txt::s3://bucket/some/path.

d4l3k · 2021-08-23T23:06:08Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        with fs.open(path, mode, encoding=encoding) as f:
+            if offset is not None:
+                f.seek(offset)


d4l3k · 2021-08-23T23:11:04Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        if isinstance(filename, bytes):
+            filename = filename.decode("utf-8")
+        fs, _, _ = fsspec.get_fs_token_paths(
+            filename,
+        )


changed to url_to_fs

d4l3k · 2021-08-23T23:11:30Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+    @_translate_errors
+    def join(self, path, *paths):
+        """Join paths with a slash."""
+        return os.path.join(path, *paths)


changed to use url_to_fs and fs.sep

d4l3k · 2021-08-24T00:06:13Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        SEPARATOR = "://"
+
+        prefixed_files = []
+        if SEPARATOR in filename:
+            for file in files:
+                if SEPARATOR not in file:
+                    file = fs.protocol + SEPARATOR + file
+                prefixed_files.append(file)
+        return prefixed_files


updated the diff to handle chained file systems and added test cases for them

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

d4l3k · 2021-08-24T00:29:08Z

added file system chaining support to join() -- chained file systems is pretty tricky to get working right

d4l3k · 2021-08-24T00:37:45Z

made filesystem existence checking much cheaper

Reubend · 2021-08-26T22:41:55Z

One thing I want to point out that doesn't matter for this PR but will matter in the future:

Because TB currently implements walk inside of as a global function rather than as a method of each filesystem, the walk performance here will be lower than if we used fsspec's built in walk method.

As an example, the generic implementation we use now lists a dir's contents and then for each one it checks if that's a dir or not. But an efficient implementation (like the ones that fsspec has) might get info on which ones are directories from the ls call directly since some filesystems support that. In the case where there are N items inside a folder that would mean the difference between 1 API call and N+1 API calls (one for the list call, and then N isdir calls).

So in the future, we should refactor this to allow custom walk implementations, and add a walk function for fsspec. Then if/when we replace gfile with fsspec entirely, we can just use that walk function instead.

nfelt

Thank you @d4l3k for contributing this and @Ruebend for the review so far! Apologies for the wait. I've tested it out a bit with local files and GCS and it looks like it's working great, just a few questions plus some minor stuff to fix the presubmit checks etc.

So in the future, we should refactor this to allow custom walk implementations, and add a walk function for fsspec. Then if/when we replace gfile with fsspec entirely, we can just use that walk function instead.

This is a great point (although FWIW, most regular users of TensorBoard who aren't doing anything special should now be going through our data server for the fast load path, which is in Rust and has its own filesystem abstraction separate from these changes). I've made note of it as part of a larger issue I filed: #5286.

nfelt · 2021-09-01T01:23:51Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+    CHAIN_SEPARATOR = "::"
+
+    def __init__(self):
+        pass


nit: this is equivalent to omitting __init__ entirely?

nfelt · 2021-09-01T01:48:36Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+                % (subdir, expected_listing, gotten_listing),
+            )
+
+    def testNonExistantFilesystem(self):


typo: Existent with an e - also below on 392, 394, 396

tensorboard/compat/tensorflow_stub/io/gfile.py

nfelt · 2021-09-02T00:01:22Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        protocol = segment.split(FSSpecFileSystem.SEPARATOR)[0]
+        if fsspec.get_filesystem_class(protocol):
+            return _FSSPEC_FILESYSTEM
+    except (ImportError, ValueError):


Can you give some indication to what would likely trigger the ImportError and ValueError? This seems very broad so it would be nice to narrow this, since a blanket swallow of ImportError/ValueError could mask issues with fsspec that we aren't anticipating and makes debugging difficult.

Also, if the ImportError one would be triggered in the case where fsspec requires a dependency in order to load a given filesystem, it seems better if we actually surface that to the user. Right now it looks like we just swallow any message from fsspec?

yeah, I was looking at https://github.com/intake/filesystem_spec/blob/master/fsspec/registry.py#L208-L213 and trying to match the value semantics of the existing code but I think bubbling up the actual ValueError/ImportError is best here

updated

nfelt · 2021-09-02T00:33:49Z

tensorboard/compat/tensorflow_stub/io/gfile.py

@@ -401,6 +411,222 @@ def stat(self, filename):
                raise


+class FSSpecFileSystem(object):
+    """Provides filesystem access via fsspec."""


Could you add a reference in this docstring to #5286 that mentions the caveat that the existing filesystem abstraction limits our ability to take full advantage of fsspec? E.g. since it doesn't allow actually maintaining file objects, repeated reads or writes to a single file require re-opening the file each time, and also the point @Reubend made about walk().

nfelt · 2021-09-02T01:05:59Z

tensorboard/pip_package/requirements_dev.txt

@@ -21,6 +21,8 @@ pandas~=1.0
 # For gfile S3 test
 boto3==1.9.86
 moto==1.3.7
+# For gfile fsspec test
+fsspec=2021.7.0


Needs an extra =

nfelt · 2021-09-02T01:09:45Z

tensorboard/summary/writer/BUILD

+    ],
+)
+
+


Could you remove the extra newline to satisfy buildifier?

nfelt · 2021-09-02T01:15:28Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+            f.write(data)
+        with gfile.GFile(ckpt_path, "r") as f:
+            f.buff_chunk_size = 4  # Test buffering by reducing chunk size
+            with self.assertRaises(errors.InvalidArgumentError):


Why is it expected that this would raise InvalidArgumentError?

the fsspec implementation here doesn't support seekable streams for text, though digging deeper into it I'm actually not quite sure why here. It uses io.TextIOWrapper which does support them, need to take a closer look at this

fsspec/filesystem_spec#743 seems to be a bug with fsspec local filesystem in text mode.

Ok, thanks for filing the upstream bug and fix! Mind adding a comment here with a reference to the fix PR so we know to correct this test once there's a newer release?

Also, would be good to have a test for the success case (i.e. if we don't force buffering by setting the small chunk size). We could either omit the buffered case entirely (with the comment), or if we keep it, mind making it assertRaisesRegex(errors.InvalidArgumentError, "not seekable") just to ensure we're targeting narrowly?

I just removed that check for now. We can add it once that's fixed. Don't want to add a check for the bad behavior only for it to fail on dep update.

tensorboard/compat/tensorflow_stub/BUILD

d4l3k

updated after feedback

tensorboard/compat/tensorflow_stub/BUILD

d4l3k · 2021-09-07T22:30:53Z

tensorboard/compat/tensorflow_stub/io/gfile.py

@@ -401,6 +411,222 @@ def stat(self, filename):
                raise


+class FSSpecFileSystem(object):
+    """Provides filesystem access via fsspec."""


tensorboard/compat/tensorflow_stub/io/gfile.py

d4l3k · 2021-09-07T22:31:28Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+    CHAIN_SEPARATOR = "::"
+
+    def __init__(self):
+        pass


d4l3k · 2021-09-07T22:34:22Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        protocol = segment.split(FSSpecFileSystem.SEPARATOR)[0]
+        if fsspec.get_filesystem_class(protocol):
+            return _FSSPEC_FILESYSTEM
+    except (ImportError, ValueError):


yeah, I was looking at https://github.com/intake/filesystem_spec/blob/master/fsspec/registry.py#L208-L213 and trying to match the value semantics of the existing code but I think bubbling up the actual ValueError/ImportError is best here

updated

d4l3k · 2021-09-07T22:34:34Z

tensorboard/pip_package/requirements_dev.txt

@@ -21,6 +21,8 @@ pandas~=1.0
 # For gfile S3 test
 boto3==1.9.86
 moto==1.3.7
+# For gfile fsspec test
+fsspec=2021.7.0


d4l3k · 2021-09-07T22:35:01Z

tensorboard/summary/writer/BUILD

+    ],
+)
+
+


d4l3k · 2021-09-07T23:22:01Z

@nfelt can you approve running workflows? want to make sure CI is happy but it says "First-time contributors need a maintainer to approve running workflows."

d4l3k · 2021-09-08T17:47:40Z

buildifier + flake8 should be happy now

tensorboard/summary/writer/BUILD

tensorboard/compat/tensorflow_stub/BUILD

nfelt · 2021-09-09T00:44:48Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+
+    def testExistance(self):
+        self.assertIsInstance(
+            gfile.get_filesystem("simplecache::nonexistent::s3://blah/blah"),


Looks like now that the ImportError isn't swallowed, these tests are failing because having s3:// in the path is prompting fsspec to require s3fs. Maybe we can just use file:// instead?

nfelt · 2021-09-09T00:49:19Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+            f.write(data)
+        with gfile.GFile(ckpt_path, "r") as f:
+            f.buff_chunk_size = 4  # Test buffering by reducing chunk size
+            with self.assertRaises(errors.InvalidArgumentError):


Ok, thanks for filing the upstream bug and fix! Mind adding a comment here with a reference to the fix PR so we know to correct this test once there's a newer release?

Also, would be good to have a test for the success case (i.e. if we don't force buffering by setting the small chunk size). We could either omit the buffered case entirely (with the comment), or if we keep it, mind making it assertRaisesRegex(errors.InvalidArgumentError, "not seekable") just to ensure we're targeting narrowly?

nfelt · 2021-09-09T01:05:26Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+            compatify = compat.as_bytes if "b" in mode else compat.as_text
+            f.write(compatify(file_content))
+
+    def _apply_prefix(self, filename, files):


It should be possible to simplify this to something like the following, which could perhaps be inlined:

prefix, chain_sep, base_url = filename.rpartition(self.CHAIN_SEPARATOR) prefix += chain_sep protocol, path = fsspec.core.split_protocol(base_url) if protocol: prefix += protocol + self.SEPARATOR return [prefix + file for file in files]

nfelt · 2021-09-09T01:23:58Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        return fs.exists(path)
+
+    @_translate_errors
+    def join(self, path, *paths):


It should be possible to simplify this to something like:

_, _, base_url = path.rpartition(self.CHAIN_SEPARATOR) protocol, path = fsspec.core.split_protocol(base_url) fs = fsspec.get_filesystem_class(protocol) return fs.sep.join((path,) + paths)

Also, it'd be better if the join step would work like os.path.join() and collapse adjacent separators (and handle absolute paths), though I see the existing S3 implementation doesn't do that. I think a helper like this would suffice:

def _join(sep, parts): result = [] for part in parts: if part.startswith(sep): result = [] if result and result[-1] and not result[-1].endswith(sep): result.append(sep) result.append(part) return "".join(result)

I realize it's an edge case, but mind adding back the and result[-1] part of the condition? That was deliberate; without it we get different behavior from os.path.join():

>>> _join("/", ("foo", "", "bar")) 'foo/bar' >>> _join_without_extra_check("/", ("foo", "", "bar")) 'foo//bar' >>> os.path.join("foo", "", "bar") 'foo/bar'

added and updated unit test to check for this case

tensorboard/compat/tensorflow_stub/io/gfile.py

d4l3k

updated to simplify path logic + other feedback

tensorboard/compat/tensorflow_stub/BUILD

tensorboard/summary/writer/BUILD

d4l3k · 2021-09-10T00:08:32Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        return fs.exists(path)
+
+    @_translate_errors
+    def join(self, path, *paths):


d4l3k · 2021-09-10T00:35:37Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+            compatify = compat.as_bytes if "b" in mode else compat.as_text
+            f.write(compatify(file_content))
+
+    def _apply_prefix(self, filename, files):


tensorboard/compat/tensorflow_stub/io/gfile.py

d4l3k · 2021-09-10T00:44:12Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+            f.write(data)
+        with gfile.GFile(ckpt_path, "r") as f:
+            f.buff_chunk_size = 4  # Test buffering by reducing chunk size
+            with self.assertRaises(errors.InvalidArgumentError):


I just removed that check for now. We can add it once that's fixed. Don't want to add a check for the bad behavior only for it to fail on dep update.

d4l3k · 2021-09-10T00:44:19Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+
+    def testExistance(self):
+        self.assertIsInstance(
+            gfile.get_filesystem("simplecache::nonexistent::s3://blah/blah"),


nfelt

Thank you for making the changes! Everything LGTM except a few last things, and aside from the one mentioned issue I've confirmed that we can land this internally as well.

I'll be away on vacation starting today through the end of next week, but I'll leave some instructions with my teammates re: trying to get this merged assuming it's ready before I'm back.

nfelt · 2021-09-10T09:44:19Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        return [
+            file
+            if (self.SEPARATOR in file or self.CHAIN_SEPARATOR in file)
+            else self.join(filename, file)


This is not quite right - reusing join() for this only works when the underlying fsspec filesystem's glob() can be relied on to return absolute paths that start with fs.sep, so that they truncate filename to just the chaining/protocol prefix and drop the glob characters that shouldn't be preserved.

Our tests don't catch this, since they use absolute local paths throughout. But the GCS filesystem doesn't:

>>> fsspec.open("gcs://").fs.glob("gs://tensorboard-bench-logs/*") ['tensorboard-bench-logs/README', 'tensorboard-bench-logs/edge_cgan', 'tensorboard-bench-logs/mnist', 'tensorboard-bench-logs/nndiv_100m', 'tensorboard-bench-logs/nndiv_100m_composite']

TensorBoard's read path does use globbing, so this breaks it for GCS, since it ends up producing incorrect globs like gs://tensorboard-bench-logs/mnist/*/tensorboard-bench-logs/mnist/*/*.

As for fixing this, glob() doesn't really specify what form the returned paths should take. But the local filesystem glob implementation seems to always normalize glob patterns to their absolute version before globbing, which means it always emits absolute paths. So it seems reasonable to assume the returned paths are meant to be "logically absolute" in that you can prepend the filesystem protocol to the path and use that to open the file. If that's the case, that's probably what we should do (i.e. explicitly add the chain prefix + protocol) rather than attempting to reuse join().

ahh yeah you're right about that, the underspecified glob output is causing problems. I was wondering about the absolute vs relative case.

I think this is now correct and I added a test case with the in memory file system which returns relative paths for glob similar to gcs

nfelt · 2021-09-10T09:50:48Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        return fs.exists(path)
+
+    @_translate_errors
+    def join(self, path, *paths):


I realize it's an edge case, but mind adding back the and result[-1] part of the condition? That was deliberate; without it we get different behavior from os.path.join():

>>> _join("/", ("foo", "", "bar")) 'foo/bar' >>> _join_without_extra_check("/", ("foo", "", "bar")) 'foo//bar' >>> os.path.join("foo", "", "bar") 'foo/bar'

nfelt · 2021-09-10T17:19:55Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+            filename = filename.decode("utf-8")
+        self._validate_path(filename)
+
+        fs, _, paths = fsspec.get_fs_token_paths(


It turns out that internally we're on an older version of fsspec (0.7.4) that doesn't have fsspec/filesystem_spec@8f96ee5, which fixes get_fs_token_paths() to handle unchaining itself.

Could we instead use fsspec.core.url_to_fs() here? That is present in 0.7.4 onwards and does handle unchaining, and it actually returns both fs and path, so it even simplifies this code to just return fsspec.core.url_to_fs(filename). With that change I can get the tests to pass internally.

d4l3k

addressed feedback + expanded unit tests to cover those cases

I did file an issue on the upstream on the pain points when handling these paths so hopefully we can get a first party maintained solution for this. fsspec/filesystem_spec#747

d4l3k · 2021-09-10T17:57:35Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        return fs.exists(path)
+
+    @_translate_errors
+    def join(self, path, *paths):


added and updated unit test to check for this case

d4l3k · 2021-09-10T18:12:59Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+        return [
+            file
+            if (self.SEPARATOR in file or self.CHAIN_SEPARATOR in file)
+            else self.join(filename, file)


ahh yeah you're right about that, the underspecified glob output is causing problems. I was wondering about the absolute vs relative case.

I think this is now correct and I added a test case with the in memory file system which returns relative paths for glob similar to gcs

d4l3k · 2021-09-10T18:13:37Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+            filename = filename.decode("utf-8")
+        self._validate_path(filename)
+
+        fs, _, paths = fsspec.get_fs_token_paths(


stephanwlee

Hello! Nick is now on long vacation so I will be taking over the review and sync. After CI passes, I will take some time to test sync again and merge.

I don't mean to do the review from the top but I have spotted minor things like typos which you may want to address. Also, the failure in CI is real.

line 531, in testGlobNonAbsolute
    self.assertCountEqual(
AssertionError: Element counts were not equal:
First has 1, Second has 0:  'memory:///dir/bar.txt'
First has 1, Second has 0:  'memory:///dir/foo.txt'
First has 0, Second has 1:  'memory://dir/foo.txt'
First has 0, Second has 1:  'memory://dir/bar.txt'

stephanwlee · 2021-09-10T18:57:23Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+        with self.assertRaises(ValueError):
+            gfile.get_filesystem("nonexistent::blah://filesystem")
+
+    def testExistance(self):


typo: Existance -> Existence

wow I just cannot spell existence correctly

stephanwlee · 2021-09-10T19:00:35Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+            gfile.FSSpecFileSystem,
+        ),
+
+    def testJoin(self):


For more completeness, would you mind adding test cases for these?

# Trailing slash on the last element >>> os.path.join("hello", "world/") 'hello/world/' # empty path at the end >>> os.path.join("hello", "world", "") 'hello/world/' # absolute path in the middle >>> os.path.join("hello", "", "world", "", "/wow") '/wow'

stephanwlee · 2021-09-10T19:04:02Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

@@ -0,0 +1,541 @@
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.


nit: 2018 -> 2021.

stephanwlee · 2021-09-10T19:04:29Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+    if not FSSPEC_ENABLED:
+        return None
+
+    segment = filename.split(FSSpecFileSystem.CHAIN_SEPARATOR)[0]


Perhaps pass a second argument to split. e.g., segment = filename.split(FSSpecFileSystem.CHAIN_SEPARATOR, 1)[0]. Here and below.

switched to use partition

stephanwlee · 2021-09-10T19:04:57Z

tensorboard/pip_package/requirements_dev.txt

@@ -21,6 +21,8 @@ pandas~=1.0
 # For gfile S3 test
 boto3==1.9.86
 moto==1.3.7
+# For gfile fsspec test
+fsspec==2021.8.1


Would you mind changing this to 0.7.4 for now? As Nick indicated, we internally are using older version of fsspec and don't want to inadvertently use a newer APIs.

d4l3k

test failure with memory paths seems to be a fsspec difference, memory uses absolute paths in 2021.8.1 but not 0.7.4, CI should be fixed with the version change

d4l3k · 2021-09-10T22:48:29Z

tensorboard/pip_package/requirements_dev.txt

@@ -21,6 +21,8 @@ pandas~=1.0
 # For gfile S3 test
 boto3==1.9.86
 moto==1.3.7
+# For gfile fsspec test
+fsspec==2021.8.1


d4l3k · 2021-09-10T22:49:08Z

tensorboard/compat/tensorflow_stub/io/gfile.py

+    if not FSSPEC_ENABLED:
+        return None
+
+    segment = filename.split(FSSpecFileSystem.CHAIN_SEPARATOR)[0]


switched to use partition

d4l3k · 2021-09-10T22:51:38Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

@@ -0,0 +1,541 @@
+# Copyright 2018 The TensorFlow Authors. All Rights Reserved.


d4l3k · 2021-09-10T22:53:40Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+            gfile.FSSpecFileSystem,
+        ),
+
+    def testJoin(self):


d4l3k · 2021-09-10T22:54:10Z

tensorboard/compat/tensorflow_stub/io/gfile_fsspec_test.py

+        with self.assertRaises(ValueError):
+            gfile.get_filesystem("nonexistent::blah://filesystem")
+
+    def testExistance(self):


wow I just cannot spell existence correctly

Nick is OOO and conditionally OK-ed merging on some changes that are addressed.

stephanwlee · 2021-09-13T17:47:41Z

Thank you for the contribution!

This adds support for fsspec filesystem backends to the gfile compat library. This allows for tensorboard to be used with a wider range of file system backends. This adds a new GFile filesystem backend that's used if no other registered filesystems are present. In that case fsspec is checked to see if it has an available one and then returns a wrapped version. fsspec provides similar semantics to normal usage however, it doesn't support seeking for files opened in text mode which can be problematic for longer text files. However, this seems to work regardless since most of tensorboards larger files are in a binary format. Tests: ```sh $ bazel run //tensorboard -- --logdir file:///tmp/torchxrun/lightning_logs/version_1/ $ bazel test //tensorboard/compat/tensorflow_stub:gfile_fsspec_test //tensorboard/summary/writer:event_file_writer_fsspec_test --test_output=errors ``` See: tensorflow#5165

google-cla bot added the cla: yes label Aug 17, 2021

Reubend suggested changes Aug 18, 2021

View reviewed changes

d4l3k force-pushed the master branch from bb9699a to 6a44737 Compare August 24, 2021 00:09

d4l3k commented Aug 24, 2021

View reviewed changes

d4l3k force-pushed the master branch from 6a44737 to 8f9b64b Compare August 24, 2021 00:28

d4l3k force-pushed the master branch from 8f9b64b to 3a46e1e Compare August 24, 2021 00:37

Reubend approved these changes Aug 24, 2021

View reviewed changes

nfelt self-requested a review September 1, 2021 01:08

nfelt added core:backend core:notf Things related to No TensorFlow mode. labels Sep 1, 2021

nfelt mentioned this pull request Sep 2, 2021

Improve and formalize TensorBoard's Python filesystem abstraction #5286

Open

nfelt suggested changes Sep 2, 2021

View reviewed changes

d4l3k force-pushed the master branch from 3a46e1e to b5a4fc2 Compare September 7, 2021 22:41

d4l3k commented Sep 7, 2021

View reviewed changes

d4l3k force-pushed the master branch 2 times, most recently from 9d8b3b9 to ee1db05 Compare September 8, 2021 17:47

d4l3k requested a review from nfelt September 8, 2021 19:21

nfelt suggested changes Sep 9, 2021

View reviewed changes

d4l3k mentioned this pull request Sep 9, 2021

os.path equivalent for fsspec fsspec/filesystem_spec#747

Open

d4l3k force-pushed the master branch from ee1db05 to 0190506 Compare September 10, 2021 00:45

d4l3k commented Sep 10, 2021

View reviewed changes

d4l3k requested a review from nfelt September 10, 2021 00:46

nfelt previously requested changes Sep 10, 2021

View reviewed changes

d4l3k force-pushed the master branch from 0190506 to cbe7802 Compare September 10, 2021 18:16

d4l3k commented Sep 10, 2021

View reviewed changes

stephanwlee reviewed Sep 10, 2021

View reviewed changes

gfile: add support for fsspec filesystems

ee45f6e

d4l3k force-pushed the master branch from cbe7802 to ee45f6e Compare September 10, 2021 23:03

d4l3k commented Sep 10, 2021

View reviewed changes

d4l3k requested review from stephanwlee and nfelt September 10, 2021 23:04

stephanwlee approved these changes Sep 13, 2021

View reviewed changes

stephanwlee merged commit 102548d into tensorflow:master Sep 13, 2021

nfelt mentioned this pull request Jan 7, 2022

Support fsspec for reading remote directories #5165

Closed

xloya mentioned this pull request May 23, 2024

[Subtask] Add Python Arrow FileSystem implementation for fileset. apache/gravitino#2059

Closed

		@@ -0,0 +1,541 @@
		# Copyright 2018 The TensorFlow Authors. All Rights Reserved.

gfile: add support for fsspec filesystems #5248

gfile: add support for fsspec filesystems #5248

Conversation

d4l3k commented Aug 17, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d4l3k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d4l3k commented Aug 24, 2021

d4l3k commented Aug 24, 2021

Reubend commented Aug 26, 2021 • edited Loading

nfelt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d4l3k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d4l3k commented Sep 7, 2021

d4l3k commented Sep 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfelt Sep 9, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d4l3k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nfelt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

d4l3k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stephanwlee left a comment

Choose a reason for hiding this comment

Reubend commented Aug 26, 2021 •

edited

Loading

nfelt Sep 9, 2021 •

edited

Loading