Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatial Extension support #55

Closed
eitsupi opened this issue Dec 14, 2023 · 17 comments
Closed

Spatial Extension support #55

eitsupi opened this issue Dec 14, 2023 · 17 comments
Assignees

Comments

@eitsupi
Copy link
Contributor

eitsupi commented Dec 14, 2023

I saw this post of ibis.
https://ibis-project.org/posts/ibis-duckdb-geospatial/

It would be great if the R client also had an integration with the Spatial Extension. (For now, it seems to be difficult to handle because it is converted to the raw type of R)

Perhaps integrating with the geoarrow package would make sense?
@paleolimbot Sorry for tagging you, but do you have any perspectives on such integrations?

@paleolimbot
Copy link

I would definitely be excited!

I think the general issue is that by default the data comes back in an internal BLOB format (you need to call st_asbinary() manually to get WKB that can be parsed by wk or sf).

I'm hoping to get the Arrow output directly as geoarrow extension arrays ( duckdb/duckdb-spatial#153 ) so that the geoarrow package ( https://github.com/geoarrow/geoarrow-c/tree/main/r/geoarrow ) can handle the whole thing automagically!

@cboettig
Copy link

I find duckdb works really nicely from R on spatial data already. I have a small wrapper since the syntax is a bit verbose otherwise, that will read in from duckdb as an sf object. We can of course use all the spatial extension functions before reading into R, which is nice for datasets that are too big for RAM.

quick example with lazy read that avoids downloading the data, reads in a few different spatial vector formats and performs a spatial join:

remotes::install_github("cboettig/duckdbfs")
library(dplyr)

url <- "https://github.com/cboettig/duckdbfs/raw/main/inst/extdata/world.gpkg"

countries <- 
  paste0("/vsicurl/", url) |> 
  duckdbfs::open_dataset()

cities <-
  paste0("/vsicurl/https://github.com/cboettig/duckdbfs/raw/",
         "main/inst/extdata/metro.fgb") |>
  duckdbfs::open_dataset()

countries |>
   filter(continent == "Oceania") |>
   spatial_join(cities, by = "st_intersects", join="inner") |>
   to_sf()

One problem that I have had though is for some reason the spatial extension does not seem to be available for Windows users. (It appears that windows extensions have to be built separately for R to be compatible with the rtools chain, and cannot use the Windows extension that all other duckdb platforms use(?) Core extensions are now built for Windows, but as I understand it, the spatial one is not. @krlmlr any insights here? duckdb/duckdb-spatial#158 It would be great for windows users to be able to use duckdb for large spatial operations too...)

@eitsupi
Copy link
Contributor Author

eitsupi commented Dec 30, 2023

R to be compatible with the rtools chain, and cannot use the Windows extension that all other duckdb platforms use(?)

That's correct since DuckDB for Windows other than R uses the MSVC ABI and only R uses the GNU ABI.

@cboettig
Copy link

cboettig commented Jan 5, 2024

Thanks @eitsupi , that matches my understanding. It's great that all of the "core" extensions are built separately with the GNU ABI for R on windows. It's really sad that windows users can't access the spatial extension though at this time.
@krlmlr -- any idea if we might get binaries for windows R users for the spatial extension?

@eitsupi
Copy link
Contributor Author

eitsupi commented Jan 6, 2024

I'm not familiar with how the DuckDB extensions are distributed, but they appear to be entirely defined by GitHub Actions, so why not simply port the following workflow to https://github.com/duckdb/duckdb_spatial?
https://github.com/duckdb/duckdb/blob/a55f89cd9e956b3e575532e058c230461799ac64/.github/workflows/R.yml#L29-L69

In any case, that's another issue.

@cboettig
Copy link

cboettig commented Jan 6, 2024

Thanks @eitsupi , that's great! it does look like that recipe could be adjusted to build the spatial extension. It's not clear to me what repository ought to be implementing it -- the workflow you link to appears to depend on a custom action defined in the main repository (https://github.com/duckdb/duckdb/blob/a55f89cd9e956b3e575532e058c230461799ac64/.github/actions/build_extensions/action.yml) which in turn depends on scripts specific to that repository -- I guess that would all need to be duplicated in the spatial extension repo?

I'm very sorry if this was the wrong thread to address this issue, though it is specific to R. In any event, other than windows support I don't see what more needs to be done to support the spatial extension for duckdb-r? (Though I was glad to see the Ibis support, it doesn't look like the approach there supports directly passing through geospatial functions the way we dbplyr does, and so they are implementing these bit by bit, while on the R side we seem to be able to use any function available in the geospatial extension immediately (e.g. like the new st_quadkey).

@krlmlr
Copy link
Collaborator

krlmlr commented Jan 7, 2024

The extensions help page has:

Only core extensions are distributed for the following platforms: windows_amd64_rtools, ...

On the other hand, the overview page doesn't mention core extensions. According to the list of official extensions, "spatial" is an official extension.

@hannes @Mytherin: Who can help shed some light here?

@eitsupi
Copy link
Contributor Author

eitsupi commented Jan 7, 2024

I don't see what more needs to be done to support the spatial extension for duckdb-r?

I wanted to point out that DuckDB lacks the ability to convert spacial types to the appropriate R types.
(In other words, the duckdb R package needs to map the DuckDB spacial types properly)

I don't think there is anything this repository can do about spacial extensions that work on R on Windows.
Just configure CI to properly build and upload extensions where appropriate.

@cboettig
Copy link

cboettig commented Jan 7, 2024

Thanks @krlmlr and co! It would be wonderful if spatial could be added to the github action that builds the other official extensions for R. (I guess it's not obvious if building the ducdkb R extension for windows belongs in the repo that handles all the other duckdb extensions for windows, the repo that handles the duckdb for R, or the repo that handles the spatial extension 😅 )

@eitsupi thanks for your help. Re mapping to native R types, this is just a matter of using the correct read methods when using those types; e.g. if the desired R type is an sf object, we merely need to convert the geometry to WKB and then specify the geometry column correctly in st_read(), https://github.com/cboettig/duckdbfs/blob/main/R/to_sf.R#L46-L51

But for users of terra for whom the appropriate type would be a vect object, they would do something similar. However, as you know, the spatial functions in these packages for vector data are designed for in-memory objects, so if a user wants to compute something like st_intersect() on a very large vector dataset, I think they are much better off doing it in duckdb (e.g. as in the example above) rather than reading it into a native format. Of course it would be great if packages like sf or terra could handle this automagically with lazy eval, (kinda like the way dbplyr does), but in any event this all seems out of scope for duckdb-r, no?

@hannes
Copy link
Member

hannes commented Jan 8, 2024

We're going to pick this up and build all extensions for windows_amd64_rtools

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 21, 2024

Per #100 (comment), this should work now? Can you confirm? Closing in favor of #100.

@krlmlr krlmlr closed this as completed Mar 21, 2024
@eitsupi
Copy link
Contributor Author

eitsupi commented Mar 21, 2024

My original intent was that this issue is not about Windows but about the proper conversion between Geospatial and R types.
Could you please reopen this?

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 21, 2024

Reopening, but the discussion got mixed up. I'd appreciate it if we could start a fresh discussion with the most important findings, up-to-date, summarized and linked here.

@krlmlr krlmlr reopened this Mar 21, 2024
@paleolimbot
Copy link

Perhaps not a complete summary, but:

  • Right now when a database result contains a column that comes from the DuckDB spatial extension, it shows up in Arrow output as a list(raw()) where each element contains an opaque internal binary representation. This often leads to confusion because the format can't be read anywhere except DuckDB. I think the same is true of a database result that does not go through Arrow (i.e., R/DBI) but I haven't checked.
  • For a database result accessed via R/DBI, I imagine that you might have enough information available to you at "convert to R" time to at least give it a class. That class could implement st::as_sfc() and give an error along the lines of "use st_as_ewkb() before collect(), which is the workaround.
  • For a database result accessed via Arrow, the solution is complex because DuckDB does not currently have a way for an extension type to customize its arrow output ( Representation of spatial types on export to ArrowArrayStream duckdb-spatial#153 ).

@hannes
Copy link
Member

hannes commented Mar 22, 2024

As far as I know spatial now builds for rtools

@krlmlr
Copy link
Collaborator

krlmlr commented Mar 22, 2024

Yes, spatial is good now, confirmed by @carlopi. Opened a new issue to investigate the OP.

@krlmlr krlmlr closed this as completed Mar 22, 2024
@cboettig
Copy link

cboettig commented Apr 9, 2024

Just wanted to confirm that spatial extension appears to be working nicely for Windows R users now too, at least as per our windows CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants