-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assigning to list column in single row table changes column type. #4568
Comments
I can reproduce in my session. Using library(data.table)
options(datatable.verbose = TRUE)
dt = data.table(a = list(1, 2))
dt$a[[1L]] = 3
#> Assigning to all 2 rows
#> RHS_list_of_columns == false
#> RHS for item 1 has been duplicated because NAMED==2 MAYBE_SHARED==1, but then is being plonked. length(values)==2; length(cols)==1)
str(dt)
#> Classes 'data.table' and 'data.frame': 2 obs. of 1 variable:
#> $ a:List of 2
#> ..$ : num 3
#> ..$ : num 2
#> - attr(*, ".internal.selfref")=<externalptr>
dt = data.table(a = list(1))
dt$a[[1L]] = 1
#> Assigning to all 1 rows
#> RHS_list_of_columns == false
#> RHS_list_of_columns revised to true because RHS list has 1 item which is NULL, or whose length 1 is either 1 or targetlen (1). Please unwrap RHS.
#> RHS for item 1 has been duplicated because NAMED==4 MAYBE_SHARED==1, but then is being plonked. length(values)==1; length(cols)==1)
str(dt)
#> Classes 'data.table' and 'data.frame': 1 obs. of 1 variable:
#> $ a: num 1
#> - attr(*, ".internal.selfref")=<externalptr> In addition to the interesting number of rows being assigned (i.e., the first example only had one row being updated but the verbose message indicated two rows...), this seems to be the code that leads to the column coercion: Lines 379 to 385 in ad7b67c
I changed line 382 to require
I am not confident on what the solution is but will make another issue relating to assignment. Finally, this behavior is different than a DF = setDF(data.table(a = list(1)))
DF$a[[1L]] = 3
str(DF)
##'data.frame': 1 obs. of 1 variable:
## $ a:List of 1
## ..$ : num 3 |
I think I might be running into this or a case that is very similar when assigning a GEOS object to a new column in a data.table with a single row. Things work fine in the two row case. In the one row case, the object has structure Reprex with output: library(data.table)
library(geos)
options(datatable.verbose = TRUE)
# Load Tigris shapefile of all LA counties
la_shape <- tigris::counties(state = "LA", progress_bar = F)
#> Retrieving data for the year 2020
la_shape_dt <- as.data.table(la_shape)
# Make a GEOS object
geos <- as_geos_geometry(la_shape)
# Add GEOS object to la_shape, la_shape_dt
la_shape$geos <- geos
la_shape_dt$geos <- geos
#> Assigning to all 64 rows
#> RHS_list_of_columns == false
#> RHS for item 1 has been duplicated because NAMED==12 MAYBE_SHARED==1, but then is being plonked. length(values)==64; length(cols)==1)
# Both have same str
str(la_shape$geos)
#> geos_geometry[1:64] <MULTIPOLYGON [-93.766 30.038...-92.887 30.491]>, <MULTIPO
str(la_shape_dt$geos)
#> geos_geometry[1:64] <MULTIPOLYGON [-93.766 30.038...-92.887 30.491]>, <MULTIPO
# Remove GEOS object from la_shape
la_shape$geos <- NULL
# Restrict to just New Orleans
nola_shape <- la_shape[la_shape$COUNTYFP=="071",]
nola_shape_dt <- as.data.table(nola_shape)
# Make a GEOS object from the subsetted dataset
nola_geos <- as_geos_geometry(nola_shape)
# add the GEOS object to nola_shape, nola_shape_dt
nola_shape$geos <- nola_geos
nola_shape_dt$geos <- nola_geos
#> Assigning to all 1 rows
#> RHS_list_of_columns == false
#> RHS_list_of_columns revised to true because RHS list has 1 item which is NULL, or whose length 1 is either 1 or targetlen (1). Please unwrap RHS.
#> RHS for item 1 has been duplicated because NAMED==4 MAYBE_SHARED==1, but then is being plonked. length(values)==1; length(cols)==1)
# The two datasets have different structure now
str(nola_shape$geos)
#> geos_geometry[1:1] <MULTIPOLYGON [-90.14 29.867...-89.625 30.199]>
str(nola_shape_dt$geos)
#> <externalptr>
# But things work fine if the data table has two rows
two_shapes <- la_shape[la_shape$COUNTYFP %in% c("071", "001"),]
two_shapes_dt <- as.data.table(two_shapes)
# Make a GEOS object from the dataset with two shapes
two_shapes_geos <- as_geos_geometry(two_shapes)
# Add the GEOS object to two_shapes, two_shapes_dt
two_shapes$two_shapes_geos <- two_shapes_geos
two_shapes_dt$two_shapes_geos <- two_shapes_geos
#> Assigning to all 2 rows
#> RHS_list_of_columns == false
#> RHS for item 1 has been duplicated because NAMED==12 MAYBE_SHARED==1, but then is being plonked. length(values)==2; length(cols)==1) Created on 2022-07-22 by the reprex package (v2.0.1) My sessionInfo():
|
When assigning to an element of a list column
a
using e.g.dt$a[[1]]
, the column remains a list-column only if the table has more than one row. If the table has one single row, the column is converted to an atomic type.Minimal reproducible example
Output of
sessionInfo()
The text was updated successfully, but these errors were encountered: