-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support URI schemes (zip://, s3://) by converting to vsi paths #43
Conversation
@pytest.mark.network | ||
def test_url(): | ||
df = pyogrio.read_dataframe( | ||
"https://raw.githubusercontent.com/geopandas/pyogrio/main/pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we OK with having tests that require network?
I added a marker for now, so at least it is easy to skip them (but by default this marker doesn't do anything)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is fine. It is surely useful and we have similar one in geopandas as well.
pyogrio/tests/test_path.py
Outdated
# @pytest.mark.network | ||
# def test_uri_s3() | ||
# df = pyogrio.read_dataframe('zip+s3://fiona-testing/coutwildrnp.zip') | ||
# assert len(df) == 67 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have an s3 test, but that also requires some set up. I also think that fiona still has some more logic around setting up credentials for GDAL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but this is of course not specific to this PR, the same thing is true for /vsis3
which already can be done before this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing, I could copy the code from dask-geopandas (https://github.com/geopandas/dask-geopandas/blob/main/tests/io/conftest.py) for a fixture that sets up a local boto client to mimic s3. Although I actually don't know how that would work with gdal's vsis3 ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue here? This test passes locally. fiona-testing
bucket seems to be public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get CPLE_AWSInvalidCredentialsError: AWS_SECRET_ACCESS_KEY and AWS_NO_SIGN_REQUEST configuration options not defined, and /home/joris/.aws/credentials not filled
error when testing this locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set the environment variable AWS_NO_SIGN_REQUEST=YES
before running the tests, this error is no longer raised (at least for me, on MacOS).
It looks like you can set this for pytest in the tests/conftest.py
file by adding:
import os
def pytest_generate_tests(metafunc):
os.environ["AWS_NO_SIGN_REQUEST"] = "YES"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think that is passes locally because I have AWS credentials set on my laptop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I don't know if it matters much, but it seems you can also do that with a fixture for a specific test, although I suppose it wouldn't actually have any influence on the other tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! This looks great. I've left a few notes. One more - it would be good to document this somewhere, either in docstrings or in the documentation (or ideally both).
pyogrio/tests/test_path.py
Outdated
# @pytest.mark.network | ||
# def test_uri_s3() | ||
# df = pyogrio.read_dataframe('zip+s3://fiona-testing/coutwildrnp.zip') | ||
# assert len(df) == 67 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the issue here? This test passes locally. fiona-testing
bucket seems to be public.
@pytest.mark.network | ||
def test_url(): | ||
df = pyogrio.read_dataframe( | ||
"https://raw.githubusercontent.com/geopandas/pyogrio/main/pyogrio/tests/fixtures/naturalearth_lowres/naturalearth_lowres.shp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it is fine. It is surely useful and we have similar one in geopandas as well.
pyogrio/util.py
Outdated
""" | ||
parts = urlparse(path) | ||
|
||
# if the scheme is not one of Rasterio's supported schemes, we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the mention of Rasterio should be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
A few minor comments to consider.
pyogrio/util.py
Outdated
"az": "az", | ||
"adls": "adls", | ||
"adl": "adls", # fsspec uses this | ||
# 'oss': 'oss', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the commented lines or add a TODO / GH issue.
Can you please add tests for hdfs
and webhdfs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added test for hdfs/webhdfs.
The reason I commented out "oss" and "swift" for now (while those are supported by GDAL) is that I don't actually know if those are used as URI in practice (eg I am not aware that you can use a "oss://path/data" URI in Python with an fsspec implementation, and in general haven't found any example online based on a quick google search). So I am a bit in doubt if we should actually enable those (the user can always still use the /vsioss/..
path as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - seems like a good idea to leave them out here; having them here implies that we support them. Can always add them later when there is a clear need.
Yes, I indeed still need to add some docs! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small comment and one likely fix to the S3 issue, but otherwise this looks great! Thanks @jorisvandenbossche
pyogrio/tests/test_path.py
Outdated
# @pytest.mark.network | ||
# def test_uri_s3() | ||
# df = pyogrio.read_dataframe('zip+s3://fiona-testing/coutwildrnp.zip') | ||
# assert len(df) == 67 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you set the environment variable AWS_NO_SIGN_REQUEST=YES
before running the tests, this error is no longer raised (at least for me, on MacOS).
It looks like you can set this for pytest in the tests/conftest.py
file by adding:
import os
def pytest_generate_tests(metafunc):
os.environ["AWS_NO_SIGN_REQUEST"] = "YES"
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Closes #21
(certainly still need to add docs)