Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rbindlist messing up with bouding box of spatial sf objects #5352

Open
rafapereirabr opened this issue Mar 17, 2022 · 2 comments
Open

rbindlist messing up with bouding box of spatial sf objects #5352

rafapereirabr opened this issue Mar 17, 2022 · 2 comments
Labels
non-atomic column e.g. list columns, S4 vector columns rbindlist

Comments

@rafapereirabr
Copy link

I've found that rbindlist{data.table} somehow changes the bouding box of spatial sf objects. This is related to issue #2273 here, and I've linked to issues on geobr and tmap packages as well.

Minimal reproducible example

devtools::install_github("ipeaGIT/geobr", subdir = "r-package")
library(geobr)
library(sf)
library(data.table)
library(waldo)

# download sf data
rr <- read_state(code_state = 'RR')
rs <- read_state(code_state = 'RS')

test_list <- list(rr, rs)

# row bind with rbindlist
t1_list <- data.table::rbindlist(test_list, fill = TRUE)
t1 <- sf::st_sf(t1_list) 
plot(t1['code_state'])

# base row bind
t2 <- rbind(rr,rs)
plot(t2['code_state'])

# compare
waldo::compare(t1, t2)

> `class(old)`: "sf" "data.table" "data.frame"
> `class(new)`: "sf"              "data.frame"
> 
> `attr(old$geom, 'bbox')`: "((-64.82525,-1.580633),(-58.88688,5.271841))"
> `attr(new$geom, 'bbox')`: "((-64.82525,-33.75208),(-49.69146,5.271841))"

For some reason, though, this problem is fixed when I run a simple subset removing a row hat does not exist in the data.

t3 <- subset(t1, abbrev_state  != "xx")
waldo::compare(t2, t3)

> `class(old)`: "sf" "data.frame"             
> `class(new)`: "sf" "data.table" "data.frame"

sessionInfo()

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] waldo_0.4.0       data.table_1.14.2 sf_1.0-7          geobr_1.6.6      

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3       rstudioapi_0.13    rematch2_2.1.2    
 [4] magrittr_2.0.2     units_0.8-0        tidyselect_1.1.2  
 [7] R6_2.5.1           rlang_1.0.2        fansi_1.0.2       
[10] httr_1.4.2         dplyr_1.0.8        tools_4.1.1       
[13] grid_4.1.1         utf8_1.2.2         KernSmooth_2.23-20
[16] cli_3.1.1          e1071_1.7-9        DBI_1.1.2         
[19] ellipsis_0.3.2     class_7.3-19       assertthat_0.2.1  
[22] tibble_3.1.6       lifecycle_1.0.1    crayon_1.5.0      
[25] purrr_0.3.4        vctrs_0.3.8        curl_4.3.2        
[28] glue_1.6.2         proxy_0.4-26       diffobj_0.3.5     
[31] compiler_4.1.1     pillar_1.7.0       generics_0.1.2    
[34] classInt_0.4-3     pkgconfig_2.0.3   

@tlapak
Copy link
Contributor

tlapak commented Mar 18, 2022

This is a known limitation of data.table. See #4415 for a discussion of why this occurs. The gist is the following: As you are aware bbox is stored as an attribute of geom and depends on the values of that vector. No data.table function ever touches these attributes. When you call subset() it actually ends up falling back on the data.frame method which calls c() which in turn recomputes the bbox. That's why it fixes it.

@tlapak tlapak added the non-atomic column e.g. list columns, S4 vector columns label Mar 23, 2022
@jkaucic
Copy link

jkaucic commented Nov 4, 2022

Sorry for the question: I also use data.table rbindlist to bind two multipolygon sf objects together. Does this mean the boundingbox of the new layer is always messed up after this and I have to run the subset command as a fix? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns rbindlist
Projects
None yet
Development

No branches or pull requests

4 participants