-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add append support to write functions #197
Conversation
With the other PRs merged, this has some conflicts now |
pyogrio/_ogr.pyx
Outdated
def ogr_driver_supports_write(driver): | ||
# exclude drivers known to be unsupported by pyogrio even though they are | ||
# supported for write by GDAL | ||
if driver in {"CSV", "XLSX"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CSV actually does work if specify GEOMETRY='AS_WKT'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also wondering (but for a different issue/PR), do we want to support writing only the attributes without geometry (like we support reading with ignoring the geometry)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - didn't know that; probably means that XLSX would work with that as well. In which case, maybe we should just pass through those drivers that are supported for write, and let the user figure out what additional options may be required if GDAL returns an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes: #105 addresses writing without geometry; was not trying to fast-track that for the upcoming release though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XLSX does not support geometry, so we need to keep this exclusion for XLSX until we do #105 and allow writing to it without geometry (though probably cleaner for user to do this through Pandas to_excel(...)
)
pyogrio/geopandas.py
Outdated
@@ -254,6 +255,11 @@ def write_dataframe( | |||
layer_options : dict, optional | |||
Layer creation option (format specific) passed to OGR. Specify as | |||
a key-value dictionary. | |||
append : bool, optional (default: False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something else: I like append=True/False
, but note that geopandas / fiona uses mode="w"/"a"
(but fiona has this because it mimics the open(..)
API, and in geopandas we can easily map the keyword for compat; we currently even explicitly raise an error for mode="a"
on the geopandas side in case of pyogrio engine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that pandas supports the same modes as the open()
API for things like to_csv()
. So we may want to be consistent but only support a subset: write_dataframe
would never support mode="r"
, for instance.
I'm not sure what the meaning of mode="r+"
would be here, since we don't expose a stateful API, and we don't (yet) let you update existing records an existing data layer.
In which case, read_dataframe
would only ever use mode="r"
(so parameter is not needed), and write_dataframe
could only ever use mode="w"
or mode="a"
- which ends up being an awful lot like our boolean append
.
Would be nice not to change this API later, so I'm certainly open to using mode
in order to make this more compatible with GeoPandas if that is what you'd like here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, for a direct user of pyogrio, I find append=True
the nicer and easier API.
get_gdal_version, | ||
get_gdal_version_string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_gdal_version, | |
get_gdal_version_string, | |
get_gdal_version, |
(the latter is not used here?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are used here; using the string version on line 289 for the error message
Thanks! |
Resolves #137
This adds basic support to append additional records to existing layers when supported by the driver.
This uses driver metadata plus a known list of drivers to exclude in order to determine which drivers support write capability (and updates listing from
list_drivers()
), and raises errors when attempting to write to unsupported drivers.This adds a specific GDAL version test when appending to FlatGeobuf to avoid segfaults due to OSGeo/gdal#5739; anything prior to GDAL 3.5.1 appears to segfault.
I could not find a driver metadata key that indicates if a driver supports append capability. Rather than try to maintain a list of known drivers that support append, I think we probably just need to let GDAL raise the errors when a given driver / GDAL version does not support append.