-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os.path equivalent for fsspec #747
Comments
On the pathlib option, please see the https://github.com/Quansight/universal_pathlib project, a layer on top of fsspec. Otherwise, the situation is indeed complex and I'm not certain we can easily define the expected behaviour of os.path.* for fsspec-compatible paths. Indeed, you can apply those builtin functions right now, and get something reasonable back (which are not what you were after, though!). >>> os.path.dirname("simplecache::zip://foo/bar::s3://bucket/path.zip")
'simplecache::zip://foo/bar::s3://bucket'
>>> os.path.basename("simplecache::zip://foo/bar::s3://bucket/path.zip")
'path.zip' (I would probably argue that |
Just took a look at universal_pathlib. It seems to throw exceptions for most chained file systems. For most filesystems posixpath works but since it defines the Potentially each FS could define the path operations and the top level fsspec |
So what we would need is not fsspec.path., but fs.path. (i.e., accessed though the filesystem instance), because different file systems will follow different patterns. It sounds doable. The universal_pathlib can call those things to complete the circle. |
I'd definitely be interested in integrating something like that into |
One thing that would help a lot would be to expose _unstrip_protocol as part of core. filesystem_spec/fsspec/utils.py Lines 454 to 462 in 1f3b6d8
A lot of operations need you to call filesystem_spec/fsspec/core.py Line 357 in f236c4f
Would be great to be able to do: full_path = "memory://bar"
fs, path = fsspec.core.url_to_fs(full_path)
for file in fs.ls(path):
print(fsspec.core.fs_to_url(fs, file)) That would help cut down a lot of duplicate code/boilerplate like |
In the current latest, Generally, ls/find/glob operations return URLs as seen by the implementation in question, so not including the protocol. There's probably no changing that. The new generics module in #828 will convert these into complete URLs in every case. |
My 2c: we've also introduced |
Ah yes, you did mention this sometime before, @efiop . You might consider upstreaming that module, maybe. |
@martindurant I see that it's provided via "full_name" on the file class but often I want to be able to ls files without the overhead of opening them. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/spec.py#L1394-L1395 Providing it on the Filesystem class would work for me though I'm not seeing anything special on http. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/http.py Good to know about the generics change! Would be nice if the _unstrip_protocol method was public (no _) in those changes |
(ah sorry, the method is only in the same PR I mentioned above, #828 - but it will come!) |
That's reasonable |
After using our Working on submitting a PR. |
Is this still being worked on? Would love to this feature implemented |
Other stuff got in the way, so I didn't manage to contribute it yet 🙁 |
@efiop Also very interested in getting this in, any chance some time might open up / is there anything anyone else can do to help out here? |
@agrinh Finally getting around to moving fs.path to plain fs methods in dvc, so hoping to get around to contributing it to fsspec around new years 🙂 (said that before, but still). |
After such a long time using it, it is clear that it wasn't the right decision and path manipulation methods should be right in the class. This also makes it possible to use most of methods as classmethods without having to initialize the filesystem at all. Per fsspec/filesystem_spec#747 (comment) and also a pre-requisite for it.
After such a long time using it, it is clear that it wasn't the right decision and path manipulation methods should be right in the class. This also makes it possible to use most of methods as classmethods without having to initialize the filesystem at all. Per fsspec/filesystem_spec#747 (comment) and also a pre-requisite for it.
After such a long time using it, it is clear that it wasn't the right decision and path manipulation methods should be right in the class. This also makes it possible to use most of methods as classmethods without having to initialize the filesystem at all. Per fsspec/filesystem_spec#747 (comment) and also a pre-requisite for it.
One of the things that have come up when trying to integrate fsspec into tensorboard (tensorflow/tensorboard#5248) is that there aren't any standard path operations as part of fsspec as far as I can tell. When dealing with more complex chained filesystems the rules get pretty complex for users to implement.
Ideally fsspec would provide the common operations such as:
I.e.
With the chained filesystems this gets really complex and I'm not 100% sure how to implement this in all cases/filesystems so feedback would be appreciated here
An option here might be to try and implement https://docs.python.org/3/library/pathlib.html#pathlib.PurePath
The text was updated successfully, but these errors were encountered: