Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: pass through vsi schemas like S3 #29

Merged
merged 1 commit into from
Jan 17, 2022
Merged

Conversation

martinfleis
Copy link
Member

GDAL natively supports more vsi schemes than vsizip. If we don't necessarily check for that one, you can already use storages like S3 out of the box.

pyogrio.read_dataframe("/vsis3/{bucket}/{filename}")
pyogrio.write_dataframe(df, "/vsis3/{bucket}/{filename}", driver="GeoJSON")

I think that it would also be nice to support the s3://{bucket}/{filename} format as well, as we do in GeoPandas (Fiona) (ref #21).

I just don't know how to test this. I tried the moto way of mocking S3 but it seems that GDAL somehow ignores that. I'd rather avoid setting up a bucket as Fiona did.

@martinfleis martinfleis marked this pull request as ready for review January 15, 2022 22:37
@martinfleis
Copy link
Member Author

Note: I wanted this to be able to properly test dask-geopandas IO by reading a large file from my S3 bucket on a coiled cluster but I quickly figured out that it is not possible at the moment.

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martinfleis !

I'm a bit confused by

not possible at the moment

Do you mean simply the missing support you added here, or did you run into other issues once you made these changes?

Since we're not yet enabling any special logic for S3 data sources, perhaps we can defer the need for testing those to GDAL for the time being? Then if / when we add variants of zip:// and s3:// we add tests for those more specifically, since more of that would be custom logic on our end..

@martinfleis
Copy link
Member Author

Do you mean simply the missing support you added here, or did you run into other issues once you made these changes?

Yes, I was just missing this. Otherwise I haven't run into any issues with the change in this PR.

Since we're not yet enabling any special logic for S3 data sources, perhaps we can defer the need for testing those to GDAL for the time being? Then if / when we add variants of zip:// and s3:// we add tests for those more specifically, since more of that would be custom logic on our end..

Okay, that sounds fine.

@brendan-ward brendan-ward merged commit 4435e9f into geopandas:main Jan 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants