-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large OSM files require different approach to reading features #272
Comments
Looks like the trick is to detect that the data source is OSM with interleaved reading, and then execute an SQL statement to set the specific layer of interest. We can use |
I suppose use_arrow by default doesn't treat this any better, or does it? -> based on my own test in the context of #269, use_arrow has the same problems except that it is 2 x as fast, most likely because it probably doesn't count the features first before fetching them... |
#271 has been merged into If it isn't working yet for you, please share more details about what part isn't working. |
Ah right, the issue without arrow is associated to a EDIT: on second thought, is it normal that the behavior differs with and without arrow? Should I open an issue about this? |
It might still be an issue to report to GDAL to also honor that setting when exporting to Arrow. |
@tfardet I wonder... OGR_GEOMETRY_ACCEPT_UNCLOSED_RING is not supposed to give an error by default... it should only give a warning. Is it possible you set import pyogrio
print(
"OGR_GEOMETRY_ACCEPT_UNCLOSED_RING: "
f"{pyogrio.get_gdal_config_option('OGR_GEOMETRY_ACCEPT_UNCLOSED_RING')}"
) Result: OGR_GEOMETRY_ACCEPT_UNCLOSED_RING: None |
Yes, I set it manually to get rid of the warning about the non closed ring with Arrow, where it seems to simply discard the related geometry. |
Ok, I understand now. The variable name implies that you specify if you want to Accept unclosed rings or not. So if you set it to "NO", this means you don't Accept them and hence an error is raised. If you want to get rid of the warning without error you should set it to "YES", not "NO". The inconsistent behaviour/bug is in the arrow implementation: if you set |
@tfardet do you open an issue in https://github.com/OSGeo/gdal/issues for this? |
#271 tries to count records directly from an OSM data source, which appears to work for small files, but produces inconsistent results for larger datasets (e.g., Alabama), which appears to be an artifact of layers being an artificial construct within OSM files rather than a meaningful structure within the dataset.
According to the GDAL docs, by default too many features may accumulate for a given layer while attempting to read a different layer, and produce an error. If you then enable interleaved reading, this avoids the error, but iterating over features may produce a count of 0 instead of the true amount.
From the example for interleaved reading in the GDAL docs, it is not immediately obvious how to alter our reading workflow. Perhaps we need to determine the index of the requested layer, and read all features of all previous layers first?
ogr2ogr
appears to do the right thing here, so we should look at that for reference.The text was updated successfully, but these errors were encountered: