-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
The below triggers the bug for me (note the assignment to col2
changing the value of col1
!):
library(data.table)
coalesce = function(x, ...) {
for (y in list(...)) {
idx = is.na(x)
x[idx] = if (length(y) != 1) y[idx] else y
}
x
}
dt = data.table(id=1:64, col1=0, col2=0)
print(dt[1, .(col1, col2)])
# col1 col2
# 1: 0 0
dt[, col1 := coalesce(col2, 111)]
dt[, col2 := 999]
print(dt[1, .(col1, col2)])
# col1 col2
# 1: 999 999
And my sessionInfo()
output:
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_Australia.1252 LC_CTYPE=English_Australia.1252 LC_MONETARY=English_Australia.1252 LC_NUMERIC=C
[5] LC_TIME=English_Australia.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.2
loaded via a namespace (and not attached):
[1] compiler_4.0.5 tools_4.0.5
Basically it looks like col1 and col2 end up pointing at the same vector such that :=
modifies them both; I'm guessing they are shared but the reference counts are off such that :=
thinks it is safe to modify in-place. Not 100% clear to me if the actual underlying bug may be base R or data.table.
When trying to put together a minimal repro, I noticed a few different changes that make this bug disappear:
-
Simply printing the data table between the col1 and col2 assignments makes the issue go away.
-
It only manifests where the number of rows is at least 64. Perhaps that is used as a threshold at which some sort of copy-on-write optimization logic is kicking in somewhere?
-
Also the problem seems to be related to the coalesce function used here, despite it not having any effect in this example. Eg replacing it with
coalesce = function(x, ...) x
avoids any issue. It seems as though base r is doing something weird with[<-
with an all false logical subset; maybe the result is the same object but no longer marked as shared? Note that assigning to col1 after coalesce does not affect col2, only vice-versa. Alternatively returningx[]
in coalesce bypasses the erroneous sharing by forcing a copy or bumping the ref count.