-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: enable support for writing to memory #397
Conversation
FWIW, I would personally first focus on getting the plain writing to memory work without append (just raise for now if the mode is append) |
Fiona has specific tests for appending so I assumed it was possible and expected behavior, but indeed in the interest of getting this out sooner we can more strictly limit this to write-only mode. |
I assume this only works when directly using the So longer term that is also something we could do, but it might requiring creating some similar custom object like fiona's MemoryFile? |
pyogrio/raw.py
Outdated
raise NotImplementedError("append is not supported for in-memory files") | ||
|
||
else: | ||
path = str(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this still call vsi_path(..)
? (what it did before in the deleted code below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took it out because I wasn't convinced that it would work properly and my instead emit confusing errors (though stringifying something that would have gotten a VSI path would be confusing too). We don't have any tests that prove writing to /vsi*/
paths would be successful. I also didn't see support for writing to a /vsi*/
path in Fiona; they are largely used for reading.
For instance, it looks like it is not possible to write via the /vsizip/
handler:
write_dataframe(df, "/vsizip//tmp/test.shp.zip", driver="ESRI Shapefile")
Traceback (most recent call last):
...
pyogrio.errors.DataSourceError: Driver ESRI Shapefile does not support write functionality
Are you aware of any things that we can write to a /vsi*/
path that should succeed? I can also restore the original behavior, in case our current unknown / untested functionality is working for someone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory /vsizip
should have write capabilities (https://gdal.org/user/virtual_file_systems.html#write-capabilities), but indeed I also don't get that working (maybe the way we handle that path internally isn't fit for working with such a vsi path)
But something like writing to S3 with vsis3
should in theory also be supported? (although that is indeed something we don't test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly tested this with a public test bucket, and I can write to an "s3://" path (which will get converted to /vsis3 by our preprocessing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for testing! That confirms we should indeed leave it in place for now.
pyogrio/tests/test_arrow.py
Outdated
@@ -582,7 +588,7 @@ def test_write_append_unsupported(tmpdir, naturalearth_lowres, driver, ext): | |||
def test_write_gdalclose_error(naturalearth_lowres): | |||
meta, table = read_arrow(naturalearth_lowres) | |||
|
|||
filename = "s3://non-existing-bucket/test.geojson" | |||
filename = "/vsis3/non-existing-bucket/test.geojson" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason for this change? (we should at least also ensure writing to s3:// works as well, so maybe parametrize the test for both options?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were producing failures when I had disabled handling the vsi paths; since that has been reverted I'm also reverting these.
pyogrio/tests/test_core.py
Outdated
# Geojson files don't hava a fast way to determine total_bounds | ||
@pytest.mark.parametrize("force_total_bounds", [True, False]) | ||
def test_read_info_force_total_bounds(tmpdir, naturalearth_lowres, force_total_bounds): | ||
# Geojson files don't have a fast way to determine total_bounds for GDAL <3.9.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally for later we should find a new file format to test with that doesn't support fast total bounds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like GeoJSONSeq still doesn't support fast total bounds, so I reverted back to using that directly.
pyogrio/raw.py
Outdated
raise NotImplementedError("append is not supported for in-memory files") | ||
|
||
else: | ||
path = str(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I quickly tested this with a public test bucket, and I can write to an "s3://" path (which will get converted to /vsis3 by our preprocessing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
Resolves #249
This enables basic writing to memory using an
io.BytesIO
object. This does not currently support append / adding layers; I tried working that out unsuccessfully in earlier commits, but we could return to this in a later PR. It seems like it should be possible in theory.This specifically blacklists ESRI Shapefile and OpenFileGDB because they write multiple files, which can't be returned successfully as a single memory file.