Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: support reading numeric subtypes (bool, int16, float32) #83

Merged
merged 4 commits into from
Apr 29, 2022

Conversation

jorisvandenbossche
Copy link
Member

We are already correctly setting the OGR SubType when writing, but not yet using them for reading. This PR adds support for that (at least for the numeric ones that map to a numpy dtype, OGR also has "JSON" and "UUID" subtypes in addition).

Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jorisvandenbossche ; this will be nice to have for 0.4 and has been on the TODO list for a while.

pyogrio/_io.pyx Outdated
@@ -47,6 +47,13 @@ FIELD_TYPES = [
None # OFTInteger64List, List of 64bit integers, not supported
]

FIELD_SUBTYPES = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest making this into a dict to make the keys more explicit rather than index order, and then below you can use FIELD_SUBTYPES.get(field_subtype) instead of exception handler.

Arguably, FIELD_TYPES could have been a dict too, but since that list was supposed to be more exhaustive we didn't handle the case where an entry might not be present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably, FIELD_TYPES could have been a dict too, but since that list was supposed to be more exhaustive we didn't handle the case where an entry might not be present.

According to the docs, also for FIELD_TYPES we shouldn't assume that it is exhaustive for future robustness (although it is right now): https://gdal.org/api/vector_c_api.html#_CPPv412OGRFieldType

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict.get is indeed nicer

pyogrio/_io.pyx Show resolved Hide resolved
)
def test_read_write_data_types_numeric(tmp_path, driver, ext):
# Point(0, 0), Point(1, 1), Point(2, 2)
geometry = np.array([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep this more readable, I'd suggest just adding a convert step from the pygeos geometries to WKB.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I kept the raw WKB, but I simplified it a bit (we are not really using the geometries anyway in the actual test, it are just some dummy values).
I could also have moved the whole test to test_geopandas.py, but since it is here now in raw IO and with the simplified code using pygeos would not reduce the lines of code, I would say it is fine to not use pygeos here.

pyogrio/tests/test_raw_io.py Outdated Show resolved Hide resolved
pyogrio/tests/test_raw_io.py Show resolved Hide resolved
Copy link
Member

@brendan-ward brendan-ward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants