Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support URI schemes (zip://, s3://) by converting to vsi paths #43

Merged
merged 7 commits into from
Feb 15, 2022

Conversation

jorisvandenbossche
Copy link
Member

Closes #21

(certainly still need to add docs)

@pytest.mark.network
def test_url():
df = pyogrio.read_dataframe(
"https://raw.githubusercontent.com/geopandas/pyogrio/main/pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we OK with having tests that require network?
I added a marker for now, so at least it is easy to skip them (but by default this marker doesn't do anything)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is fine. It is surely useful and we have similar one in geopandas as well.

# @pytest.mark.network
# def test_uri_s3()
# df = pyogrio.read_dataframe('zip+s3://fiona-testing/coutwildrnp.zip')
# assert len(df) == 67
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have an s3 test, but that also requires some set up. I also think that fiona still has some more logic around setting up credentials for GDAL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but this is of course not specific to this PR, the same thing is true for /vsis3 which already can be done before this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, I could copy the code from dask-geopandas (https://github.com/geopandas/dask-geopandas/blob/main/tests/io/conftest.py) for a fixture that sets up a local boto client to mimic s3. Although I actually don't know how that would work with gdal's vsis3 ..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue here? This test passes locally. fiona-testing bucket seems to be public.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get CPLE_AWSInvalidCredentialsError: AWS_SECRET_ACCESS_KEY and AWS_NO_SIGN_REQUEST configuration options not defined, and /home/joris/.aws/credentials not filled error when testing this locally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the environment variable AWS_NO_SIGN_REQUEST=YES before running the tests, this error is no longer raised (at least for me, on MacOS).

It looks like you can set this for pytest in the tests/conftest.py file by adding:

import os

def pytest_generate_tests(metafunc):
    os.environ["AWS_NO_SIGN_REQUEST"] = "YES"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think that is passes locally because I have AWS credentials set on my laptop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I don't know if it matters much, but it seems you can also do that with a fixture for a specific test, although I suppose it wouldn't actually have any influence on the other tests

Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks great. I've left a few notes. One more - it would be good to document this somewhere, either in docstrings or in the documentation (or ideally both).

# @pytest.mark.network
# def test_uri_s3()
# df = pyogrio.read_dataframe('zip+s3://fiona-testing/coutwildrnp.zip')
# assert len(df) == 67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the issue here? This test passes locally. fiona-testing bucket seems to be public.

@pytest.mark.network
def test_url():
df = pyogrio.read_dataframe(
"https://raw.githubusercontent.com/geopandas/pyogrio/main/pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is fine. It is surely useful and we have similar one in geopandas as well.

pyogrio/util.py Outdated
"""
parts = urlparse(path)

# if the scheme is not one of Rasterio's supported schemes, we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the mention of Rasterio should be updated.

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

A few minor comments to consider.

pyogrio/core.py Show resolved Hide resolved
pyogrio/util.py Outdated
"az": "az",
"adls": "adls",
"adl": "adls", # fsspec uses this
# 'oss': 'oss',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the commented lines or add a TODO / GH issue.

Can you please add tests for hdfs and webhdfs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for hdfs/webhdfs.

The reason I commented out "oss" and "swift" for now (while those are supported by GDAL) is that I don't actually know if those are used as URI in practice (eg I am not aware that you can use a "oss://path/data" URI in Python with an fsspec implementation, and in general haven't found any example online based on a quick google search). So I am a bit in doubt if we should actually enable those (the user can always still use the /vsioss/.. path as well)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - seems like a good idea to leave them out here; having them here implies that we support them. Can always add them later when there is a clear need.

@jorisvandenbossche
Copy link
Member Author

it would be good to document this somewhere, either in docstrings or in the documentation (or ideally both).

Yes, I indeed still need to add some docs!

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment and one likely fix to the S3 issue, but otherwise this looks great! Thanks @jorisvandenbossche

docs/source/introduction.md Outdated Show resolved Hide resolved
# @pytest.mark.network
# def test_uri_s3()
# df = pyogrio.read_dataframe('zip+s3://fiona-testing/coutwildrnp.zip')
# assert len(df) == 67
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set the environment variable AWS_NO_SIGN_REQUEST=YES before running the tests, this error is no longer raised (at least for me, on MacOS).

It looks like you can set this for pytest in the tests/conftest.py file by adding:

import os

def pytest_generate_tests(metafunc):
    os.environ["AWS_NO_SIGN_REQUEST"] = "YES"

Co-authored-by: Brendan Ward <bcward@astutespruce.com>
Copy link
Member

@martinfleis martinfleis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@martinfleis martinfleis merged commit 8136492 into geopandas:main Feb 15, 2022
@jorisvandenbossche jorisvandenbossche deleted the parse-path branch February 15, 2022 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: support URI schemes like zip:// (in addition to GDAL's vsi)
3 participants