Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a way to get the CRS data from st_read #156

Closed
Maxxen opened this issue Oct 25, 2023 Discussed in #155 · 6 comments · Fixed by #227
Closed

Provide a way to get the CRS data from st_read #156

Maxxen opened this issue Oct 25, 2023 Discussed in #155 · 6 comments · Fixed by #227
Labels
enhancement New feature or request

Comments

@Maxxen
Copy link
Member

Maxxen commented Oct 25, 2023

Discussed in #155

Originally posted by jdoig October 25, 2023
Please excuse me if I'm being stupid (or just haven't read the manual thoroughly enough)... But is there anyway from within duckdb to get the CRS data from the .prj file in the shx I'm parsing with ST_READ?

I see that the extension integrates with PROJ which makes me suspect what I'm asking is possible (though I do see from the documentation that the geometry itself can't contain SRID information yet)


I see two approaches to handle this, we probably want both.

  • Implement something like a st_describe to output the GDAL metadata from a dataset
  • Output the CRS of the dataset as an additional columns when calling st_read, we just have to be careful to deduplicate the column names, or maybe even allow specifying the column name with a parameter.
@jdoig
Copy link

jdoig commented Oct 25, 2023

I like the st_describe option, as it's simple, and an example could be provided in the documentation showing how to do a select from st_read that also selects the relevant columns from st_describe to achieve the same results as your second option. Leaving the complexity of deduplication to the user (who can do so in the simplest way that suits their data set)

@kylebarron
Copy link

Not sure how this works within duckdb, but in geoarrow, it's possible to set metadata on a column. In relation to #153, it would be great if you could store some CRS metadata on the field so that interop to/from GeoArrow would work, and that might also simplify how to access the CRS?

@paleolimbot
Copy link

The idea that the CRS is a type-level concept is an important one for optimizing kernels and I hope it's the approach that gets used here! Postgres (and maybe DuckDB? Apologies for not being familiar with this extension yet 😬 ) store "SRID" as a value-level concept but this slows down kernels because there is a lot of branching on "if SRID1 != SRID2" (or a lot of potentially invalid comparisons if these values aren't checked). Postgres doesn't have a concept of type-level metadata (er, it does, but the type-level metadata like array column shape is generally not available to kernels); DuckDB might (e.g., decimal(precision, scale)). CRS at the type level lets you error at kernel dispatch time and loop faster in kernels!

@Maxxen
Copy link
Member Author

Maxxen commented Oct 26, 2023

Yeah I agree that CRS should not be stored per value, so far Ive tried to tell people to store it as a column instead, which makes much more sense from a db perspective, but Im not opposed to having it default to a type level metadata (like decimalk), that would be a good middle ground default I think, and you could still store it as a separate column if you really want to mix and match.

Generally I feel like kernels should be agnostic to the CRS though? The only thing I can see mattering is whether the geom is planar or spherical, but I think thats better solved with a GEOGRAPHY type.

@paleolimbot
Copy link

Generally I feel like kernels should be agnostic to the CRS though

Totally! I think what I had in mind was if you do something like st_intersects you can error if x and y do not have identical CRS. In Acero that would happen at kernel selection time (not sure where what happens in DuckDB).

@Maxxen
Copy link
Member Author

Maxxen commented Oct 27, 2023

Generally I feel like kernels should be agnostic to the CRS though

Totally! I think what I had in mind was if you do something like st_intersects you can error if x and y do not have identical CRS. In Acero that would happen at kernel selection time (not sure where what happens in DuckDB).

Right now It doesnt 😅, but I agree it, absolutely should error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants