-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: support reading numeric subtypes (bool, int16, float32) #83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorisvandenbossche ; this will be nice to have for 0.4 and has been on the TODO list for a while.
pyogrio/_io.pyx
Outdated
@@ -47,6 +47,13 @@ FIELD_TYPES = [ | |||
None # OFTInteger64List, List of 64bit integers, not supported | |||
] | |||
|
|||
FIELD_SUBTYPES = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest making this into a dict
to make the keys more explicit rather than index order, and then below you can use FIELD_SUBTYPES.get(field_subtype)
instead of exception handler.
Arguably, FIELD_TYPES
could have been a dict
too, but since that list was supposed to be more exhaustive we didn't handle the case where an entry might not be present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arguably,
FIELD_TYPES
could have been adict
too, but since that list was supposed to be more exhaustive we didn't handle the case where an entry might not be present.
According to the docs, also for FIELD_TYPES
we shouldn't assume that it is exhaustive for future robustness (although it is right now): https://gdal.org/api/vector_c_api.html#_CPPv412OGRFieldType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dict.get
is indeed nicer
pyogrio/tests/test_raw_io.py
Outdated
) | ||
def test_read_write_data_types_numeric(tmp_path, driver, ext): | ||
# Point(0, 0), Point(1, 1), Point(2, 2) | ||
geometry = np.array([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep this more readable, I'd suggest just adding a convert step from the pygeos geometries to WKB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I kept the raw WKB, but I simplified it a bit (we are not really using the geometries anyway in the actual test, it are just some dummy values).
I could also have moved the whole test to test_geopandas.py, but since it is here now in raw IO and with the simplified code using pygeos would not reduce the lines of code, I would say it is fine to not use pygeos here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorisvandenbossche !
We are already correctly setting the OGR SubType when writing, but not yet using them for reading. This PR adds support for that (at least for the numeric ones that map to a numpy dtype, OGR also has "JSON" and "UUID" subtypes in addition).