-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interop with spatial extension #117
Comments
Assuming my problems are related to this issue (and are not distinct): The examples below show some attempts at reading/writing from a spatial enabled duckdb. I can get data in through sql but not library(duckdb)
#> Warning: package 'duckdb' was built under R version 4.3.3
#> Loading required package: DBI
library('sf')
#> Linking to GEOS 3.11.2, GDAL 3.7.2, PROJ 9.3.0; sf_use_s2() is TRUE
library('glue')
# Init DB
con <- dbConnect(duckdb::duckdb())
dbExecute(con, "install spatial; load spatial;")
#> [1] 0
# USE ST read
dbExecute(con, glue("create table spat as select *
from ST_Read('{system.file('shape/nc.shp', package='sf')}')"))
#> [1] 99
# Try to get the stuff
a1 = dbGetQuery(con, 'select * from spat')
# I've not found a way to convert format of geom into something usable
a2 = st_read(con, 'spat') #breaks on the binary
#> Warning in st_read.DBIObject(con, "spat"): Could not find a simple features
#> geometry column. Will return a `data.frame`.
# Try to write it
# Gets errors
nc = sf::read_sf(system.file("shape/nc.shp", package="sf"))
dbWriteTable(con, name = 'nc', nc)
#> Note: method with signature 'DBIObject#sf' chosen for function 'dbDataType',
#> target signature 'duckdb_connection#sf'.
#> "duckdb_connection#ANY" would also be valid
#> Error: rapi_execute: Failed to run query
#> Error: Conversion Error: ParseException: Unknown type: '0106000020AB100000010000000103000000010000001B000000000000A0415E54C000000060FF1D4240000000209D6254C000000080E122424000000080F76354C0000000200523424000000020846854C0000000A09B2B4240000000C06D6F54C00000000026324240000000A0B06C54C000000040633C4240000000A0FA6C54C0000000C07942424000000040E16A54C0000000A0794B424000000060195654C0000000A053494240000000203E5654C000000060DA44424000000020C95454C000000040C0414240000000800D5454C000000080873D4240000000000A5154C000000060F637424000000060D25054C000000060D833424000000080674F54C0000000C090304240000000605A4F54C000000040C42E424000000060E95054C0000000E01B2D4240000000400E5554C000000040872E4240000000C0205754C000000060342D424000000080675754C000000000662B424000000020AA5654C0000000205D26424000000060845754C000000060AC23424000000040025A54C0000000A07C244240000000A0635A54C0000000A03622424000000020965B54C0000000405F21424000000020FC5C54C0000000C0AA1E4240000000A0415E54C000000060FF1D4240'
sf::st_write(nc, dsn = con, layer = 'pts')
#> Error: rapi_execute: Failed to run query
#> Error: Conversion Error: ParseException: Unknown type: '0106000020AB100000010000000103000000010000001B000000000000A0415E54C000000060FF1D4240000000209D6254C000000080E122424000000080F76354C0000000200523424000000020846854C0000000A09B2B4240000000C06D6F54C00000000026324240000000A0B06C54C000000040633C4240000000A0FA6C54C0000000C07942424000000040E16A54C0000000A0794B424000000060195654C0000000A053494240000000203E5654C000000060DA44424000000020C95454C000000040C0414240000000800D5454C000000080873D4240000000000A5154C000000060F637424000000060D25054C000000060D833424000000080674F54C0000000C090304240000000605A4F54C000000040C42E424000000060E95054C0000000E01B2D4240000000400E5554C000000040872E4240000000C0205754C000000060342D424000000080675754C000000000662B424000000020AA5654C0000000205D26424000000060845754C000000060AC23424000000040025A54C0000000A07C244240000000A0635A54C0000000A03622424000000020965B54C0000000405F21424000000020FC5C54C0000000C0AA1E4240000000A0415E54C000000060FF1D4240'
dbWriteTable(con, 'nc', nc, append = T, field.types = c('id' = 'INTEGER', 'geometry' = 'GEOMETRY'))
#> Error: rapi_execute: Failed to run query
#> Error: Conversion Error: ParseException: Unknown type: '0106000020AB100000010000000103000000010000001B000000000000A0415E54C000000060FF1D4240000000209D6254C000000080E122424000000080F76354C0000000200523424000000020846854C0000000A09B2B4240000000C06D6F54C00000000026324240000000A0B06C54C000000040633C4240000000A0FA6C54C0000000C07942424000000040E16A54C0000000A0794B424000000060195654C0000000A053494240000000203E5654C000000060DA44424000000020C95454C000000040C0414240000000800D5454C000000080873D4240000000000A5154C000000060F637424000000060D25054C000000060D833424000000080674F54C0000000C090304240000000605A4F54C000000040C42E424000000060E95054C0000000E01B2D4240000000400E5554C000000040872E4240000000C0205754C000000060342D424000000080675754C000000000662B424000000020AA5654C0000000205D26424000000060845754C000000060AC23424000000040025A54C0000000A07C244240000000A0635A54C0000000A03622424000000020965B54C0000000405F21424000000020FC5C54C0000000C0AA1E4240000000A0415E54C000000060FF1D4240' Created on 2024-03-28 with reprex v2.1.0 |
@dcaseykc You are really close, just missing two things: First, you must remember that duckdb spatial has it's own native encoding of a geometry column that is not WKB. So it is up to you to manually call library(duckdb)
library(sf)
library(glue)
library(dplyr)
con <- dbConnect(duckdb::duckdb())
dbExecute(con, "install spatial; load spatial;")
dbExecute(con, glue("create table spat as select *
from ST_Read('{system.file('shape/nc.shp', package='sf')}')"))
a1 = dbGetQuery(con, 'select * from spat')
q <- tbl(con, "spat") |> mutate(geometry = st_aswkb(geom)) |> dbplyr::sql_render()
a2 = st_read(con, query = q, geometry_column = "geometry") However, I find all that really verbose and hard to read. library(duckdbfs)
nc <- open_dataset("https://github.com/r-spatial/sf/raw/main/inst/shape/nc.shp")
nc |> to_sf() HTH |
@cboettig thanks for your response to the original post. It is most helpful. Can you please give an example of how open_dataset can be used to open a spatial dataset from duckdb in memory. open_dateset requires a "sources" parameter. Based on your feedback above the following successfully returns an sf object from the in memory database.
My expectation is that the following should do the same:
However, this returns Your advice will be appreciated. |
@jacodevries Not entirely sure I follow your question. Your example doesn't run -- it refers to a If you already have a connection to a duckdb dataset that already has a the desired table materialized in it, you can just do library(duckdb)
library(dplyr)
library(duckdbfs)
library(sf)
duckDBMemory <- duckdb::dbConnect(duckdb())
ex = open_dataset("https://github.com/r-spatial/sf/raw/main/inst/shape/nc.shp",
tblname = "cadastre_tbl", conn = duckDBMemory)
# If the table already exists, just use `tbl` like always!
also_ex <- tbl(duckDBMemory, "cadastre_tbl")
# this works just fine with other duckdbfs functions, like to_sf() shown above:
also_ex <- tbl(duckDBMemory, "cadastre_tbl")
# Note that these are in fact identical
identical(ex, also_ex) so we can use |
@cboettig My apology - I realised that I left out a few critical details - let me rephrase my question:
I've therefore tried to figure out how to use your suggested wrapper function to read the duckdb data directly in order to make the code less verbose. Therefore using your suggested code snippets: The spatial database is created using:
Then using the following code snippet, the spatial data is returned, but in raw format:
The following verbose snippet returns the data in sf format:
My question: how can I use the
However, this returns Is it possible to use |
@jacodevries yes, I think I understand what you are asking, but please use the function library(dplyr); library(duckdbfs)
tbl(con, "spat") |> to_sf() That is the same as the more verbose code you show: q <- tbl(con, "spat") |> mutate(geometry = st_aswkb(geom)) |> dbplyr::sql_render()
a2 = st_read(con, query = q, geometry_column = "geometry") these two blocks of code are essentially identical. |
@cboettig Thanks for your response to my query. My attempt to use open_dataset is because of a misunderstanding on my part of when to use of the following code chunk:
Thanks very much for your assistance. |
From #55 (comment):
Perhaps not a complete summary, but:
list(raw())
where each element contains an opaque internal binary representation. This often leads to confusion because the format can't be read anywhere except DuckDB. I think the same is true of a database result that does not go through Arrow (i.e., R/DBI) but I haven't checked.st::as_sfc()
and give an error along the lines of "usest_as_ewkb()
beforecollect()
, which is the workaround.ArrowArrayStream
duckdb-spatial#153 ).CC @paleolimbot @eitsupi @cboettig .
The text was updated successfully, but these errors were encountered: