-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: refactor handling of reading from in-memory dataset #407
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks a lot
@@ -817,7 +847,7 @@ def test_write_memory(naturalearth_lowres, driver): | |||
|
|||
buffer = BytesIO() | |||
write_arrow( | |||
table.slice(0, 1), | |||
table, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did this was to avoid the filterwarning you added .. ;) Ideally we would add a test dataset without mixed geometries that we can use for most tests .. (or just edit the naturalearth fixture to give all MultiPolygons), but that's for another issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched the places where we were doing .slice(0, 1)
to only read 1 feature from naturalearth, and sidestep the issue (and also still allow comparison of first table to the written & reread table).
Resolves #401
This moves all handling of creation / destruction of the in-memory dataset used for read operations to the Cython tier, and now passes in either a string or bytes to
read_info()
,list_layers()
,read_bounds()
,read()
,read_arrow()
functions. I think this also sets us to for later refactors where we may pass file-like / filesystem-like objects directly down to Cython in order to use the GDAL virtual filesystem plugin.To keep things organized, I split VSI related Cython functions into the new
_vsi.pyx
/_vsi.pxd
files.Because the bytes buffer is passed as a parameter, it remains in scope during the Cython function and we don't need to hold an extra handle on it like before. (the handle was required to prevent Python from deallocating it while GDAL still may use it to represent the in-memory dataset).
This expands the tests to verify that core functions still work correctly when passed bytes or file-like objects. To avoid the various GPKG related errors when working with this type in memory, I instead added new test fixtures to create a GeoJSON file from the first 3 records of the naturalearth_lowres dataset.
This also adds a specific check that the incoming path does not already contain
/vsimem/
, since we have to handle that internally and cannot allow it to be passed in.