Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning to list column in single row table changes column type. #4568

Open
mb706 opened this issue Jun 22, 2020 · 2 comments
Open

Assigning to list column in single row table changes column type. #4568

mb706 opened this issue Jun 22, 2020 · 2 comments
Labels
non-atomic column e.g. list columns, S4 vector columns

Comments

@mb706
Copy link

mb706 commented Jun 22, 2020

When assigning to an element of a list column a using e.g. dt$a[[1]], the column remains a list-column only if the table has more than one row. If the table has one single row, the column is converted to an atomic type.

Minimal reproducible example

library("data.table")
dt <- data.table(a = list(1, 2))
dt$a[[1]] <- 1
is.list(dt$a)
#> [1] TRUE  # as expected!
dt <- data.table(a = list(1))
dt$a[[1]] <- 1
is.list(dt$a)
#> [1] FALSE  # unlike expected!

Output of sessionInfo()

R version 3.6.3 (2020-02-29)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: Fedora 31 (Thirty One)

Matrix products: default
BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8    
 [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
 [7] LC_PAPER=en_US.utf8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.12.8

loaded via a namespace (and not attached):
[1] compiler_3.6.3
@ColeMiller1
Copy link
Contributor

ColeMiller1 commented Jun 22, 2020

I can reproduce in my session. Using options(datatable.verbose = TRUE) is interesting:

library(data.table)
options(datatable.verbose = TRUE)
dt = data.table(a = list(1, 2))
dt$a[[1L]] = 3
#> Assigning to all 2 rows
#> RHS_list_of_columns == false
#> RHS for item 1 has been duplicated because NAMED==2 MAYBE_SHARED==1, but then is being plonked. length(values)==2; length(cols)==1)
str(dt)
#> Classes 'data.table' and 'data.frame':   2 obs. of  1 variable:
#>  $ a:List of 2
#>   ..$ : num 3
#>   ..$ : num 2
#>  - attr(*, ".internal.selfref")=<externalptr>

dt = data.table(a = list(1))
dt$a[[1L]] = 1
#> Assigning to all 1 rows
#> RHS_list_of_columns == false
#> RHS_list_of_columns revised to true because RHS list has 1 item which is NULL, or whose length 1 is either 1 or targetlen (1). Please unwrap RHS.
#> RHS for item 1 has been duplicated because NAMED==4 MAYBE_SHARED==1, but then is being plonked. length(values)==1; length(cols)==1)
str(dt)
#> Classes 'data.table' and 'data.frame':   1 obs. of  1 variable:
#>  $ a: num 1
#>  - attr(*, ".internal.selfref")=<externalptr>

In addition to the interesting number of rows being assigned (i.e., the first example only had one row being updated but the verbose message indicated two rows...), this seems to be the code that leads to the column coercion:

data.table/src/assign.c

Lines 379 to 385 in ad7b67c

if (TYPEOF(values)==VECSXP && length(cols)==1 && length(values)==1) {
SEXP item = VECTOR_ELT(values,0);
if (isNull(item) || length(item)==1 || length(item)==targetlen) {
RHS_list_of_columns=true;
if (verbose) Rprintf(_("RHS_list_of_columns revised to true because RHS list has 1 item which is NULL, or whose length %d is either 1 or targetlen (%d). Please unwrap RHS.\n"), length(item), targetlen);
}
}

I changed line 382 to require nrow > 1 and discovered there are some tests that fail based on this change:

# quoted `:=` expression did not replace dot with list, #3425
d = data.table(a=1L)
qcall = quote(b := .(2L))
test(1996.1, d[, eval(qcall)], data.table(a=1L, b=2L))

> x = d[, eval(qcall)] 
   a b  [Key= Types=int,lis Classes=int,lis]
1: 1 2                                      
> y = data.table(a = 1L, b = 2L) 
   a b  [Key= Types=int,int Classes=int,int]
1: 1 2    

## also 1996.2, 2074.05, 2119.1, 2119.14

I am not confident on what the solution is but will make another issue relating to assignment.

Finally, this behavior is different than a data.frame:

DF = setDF(data.table(a = list(1)))
DF$a[[1L]] = 3
str(DF)

##'data.frame':	1 obs. of  1 variable:
## $ a:List of 1
##  ..$ : num 3

@jangorecki jangorecki added the non-atomic column e.g. list columns, S4 vector columns label Jun 22, 2020
@berg-michael
Copy link

I think I might be running into this or a case that is very similar when assigning a GEOS object to a new column in a data.table with a single row. Things work fine in the two row case. In the one row case, the object has structure <externalptr>.

Reprex with output:

library(data.table)
library(geos)
options(datatable.verbose = TRUE)
# Load Tigris shapefile of all LA counties

  la_shape <- tigris::counties(state = "LA", progress_bar = F)
#> Retrieving data for the year 2020
  la_shape_dt <- as.data.table(la_shape)
  
# Make a GEOS object
  
  geos <- as_geos_geometry(la_shape)

# Add GEOS object to la_shape, la_shape_dt

  la_shape$geos <- geos
  la_shape_dt$geos <- geos
#> Assigning to all 64 rows
#> RHS_list_of_columns == false
#> RHS for item 1 has been duplicated because NAMED==12 MAYBE_SHARED==1, but then is being plonked. length(values)==64; length(cols)==1)

# Both have same str
  
  str(la_shape$geos)
#>  geos_geometry[1:64] <MULTIPOLYGON [-93.766 30.038...-92.887 30.491]>, <MULTIPO
  str(la_shape_dt$geos)
#>  geos_geometry[1:64] <MULTIPOLYGON [-93.766 30.038...-92.887 30.491]>, <MULTIPO
  
# Remove GEOS object from la_shape
  
  la_shape$geos <- NULL
  
# Restrict to just New Orleans

  nola_shape <- la_shape[la_shape$COUNTYFP=="071",]
  nola_shape_dt <- as.data.table(nola_shape)
  
# Make a GEOS object from the subsetted dataset
  
  nola_geos <- as_geos_geometry(nola_shape)
  
# add the GEOS object to nola_shape, nola_shape_dt
  
  nola_shape$geos <- nola_geos
  nola_shape_dt$geos <- nola_geos
#> Assigning to all 1 rows
#> RHS_list_of_columns == false
#> RHS_list_of_columns revised to true because RHS list has 1 item which is NULL, or whose length 1 is either 1 or targetlen (1). Please unwrap RHS.
#> RHS for item 1 has been duplicated because NAMED==4 MAYBE_SHARED==1, but then is being plonked. length(values)==1; length(cols)==1)
  
# The two datasets have different structure now
  
  str(nola_shape$geos)
#>  geos_geometry[1:1] <MULTIPOLYGON [-90.14 29.867...-89.625 30.199]>
  str(nola_shape_dt$geos)
#> <externalptr>
  
# But things work fine if the data table has two rows
  
  two_shapes <- la_shape[la_shape$COUNTYFP %in% c("071", "001"),]
  two_shapes_dt <- as.data.table(two_shapes)
  
# Make a GEOS object from the dataset with two shapes
  
  two_shapes_geos <- as_geos_geometry(two_shapes)
  
# Add the GEOS object to two_shapes, two_shapes_dt
  
  two_shapes$two_shapes_geos <- two_shapes_geos
  two_shapes_dt$two_shapes_geos <- two_shapes_geos
#> Assigning to all 2 rows
#> RHS_list_of_columns == false
#> RHS for item 1 has been duplicated because NAMED==12 MAYBE_SHARED==1, but then is being plonked. length(values)==2; length(cols)==1)

Created on 2022-07-22 by the reprex package (v2.0.1)

My sessionInfo():

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.14.2 geos_0.1.3       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8.3       lattice_0.20-45    class_7.3-20       ps_1.6.0           assertthat_0.2.1  
 [6] digest_0.6.29      utf8_1.2.2         R6_2.5.1           reprex_2.0.1       evaluate_0.15     
[11] e1071_1.7-9        httr_1.4.2         highr_0.9          pillar_1.7.0       rlang_1.0.2       
[16] curl_4.3.2         uuid_1.0-4         rstudioapi_0.13    callr_3.7.0        R.utils_2.11.0    
[21] R.oo_1.24.0        rmarkdown_2.13     styler_1.7.0       rgdal_1.5-29       stringr_1.4.0     
[26] foreign_0.8-82     proxy_0.4-26       compiler_4.1.3     xfun_0.30          pkgconfig_2.0.3   
[31] tigris_1.6         clipr_0.8.0        htmltools_0.5.2    tidyselect_1.1.2   tibble_3.1.6      
[36] fansi_1.0.3        crayon_1.5.1       dplyr_1.0.8        withr_2.5.0        sf_1.0-7          
[41] R.methodsS3_1.8.1  wk_0.6.0           rappdirs_0.3.3     grid_4.1.3         lifecycle_1.0.1   
[46] DBI_1.1.2          magrittr_2.0.2     units_0.8-0        KernSmooth_2.23-20 cli_3.2.0         
[51] stringi_1.7.6      libgeos_3.11.0-1   fs_1.5.2           sp_1.4-6           ellipsis_0.3.2    
[56] generics_0.1.2     vctrs_0.3.8        tools_4.1.3        R.cache_0.15.0     glue_1.6.2        
[61] purrr_0.3.4        processx_3.5.3     fastmap_1.1.0      yaml_2.3.5         maptools_1.1-3    
[66] classInt_0.4-3     knitr_1.38    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
non-atomic column e.g. list columns, S4 vector columns
Projects
None yet
Development

No branches or pull requests

4 participants