-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-Forge #2590] During grouping, last group overwrites memory used by previous groups #507
Comments
This is a broad issue caused by a memory location being re-used between groups. This is normally not a problem, but in some cases it can manifest. For example in case of Linked SO example simplified: DT = data.table(id1 = c(1L,1L,1L,2L,2L), v1 = c(15,13,14,18,20), v2 = c(8,7,7,13,15))
fit = DT[, list(list(lm(v1 ~ v2))), by = id1]
sapply(fit$V1, nobs)
#[1] 3 2
fit2 = fit[, list(list(update(V1[[1L]], ~ . -v1))), by = id1]
sapply(fit2$V1, nobs)
#[1] 3 3 ## should be: 3 2 For now the workaround is to Related issues:
|
With reference counting on by default now, shouldn't it now be possible to fix? (Was just again badly bitten by this) |
AFAIK reference counting is only the default from |
We are not limited by R version. Already in many places we provide features conditionally, so we could use reference counting if R supports it, otherwise just leave as it is. Myself I would be more in favour of avoiding such a big feature to be used conditionally, as maintenance of it can be extra burden. On the other hand it can be good for a transition period, in case of a bug in new mechanism, user can easily fall back to previous implementation. |
Trying to avoid version specific behavior is very reasonable but perhaps we can at least show a warning if this issue is detected (only in R≥4) (without otherwise changing behavior)? From outside of the package this would be possible with something like: library(data.table)
with_seatbelts <- function(expr){
vars <- ls(envir = parent.frame())
n_refs_before <- sapply(vars, pryr:::named2, parent.frame())
res <- eval(expr, envir = parent.frame())
n_refs_after <- sapply(vars, pryr:::named2, parent.frame())
vars_with_refs_increased <- vars[n_refs_after>n_refs_before]
#print(data.table(vars,n_refs_before,n_refs_after))
if(as.integer(R.version$major) >= 4 && evalq(.NGRP>.GRP, envir = parent.frame()) && length(vars_with_refs_increased)>0) warning(paste0(
"Variable '", vars_with_refs_increased, "' used in i such at least one reference persists, ",
"data.data should make a copy of it now but instead is changing it by reference. ",
"This might cause some issues for you. Avoid them by using 'copy(", vars_with_refs_increased ,")' instead. ",
"If R was compiled with reference counting turned off (the default for R<4.0) you might not see this warning). ",
"See https://github.com/Rdatatable/data.table/issues/507 for details.\n", collapse= "\n"
))
res
}
x <- list()
dt <- data.table(ii=1:3, letter=letters[1:3])
dt[, with_seatbelts({x[[ii]] <<- letter[]; NULL}), by=letter]
#> Empty data.table (0 rows and 1 cols): letter
x
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "b"
#>
#> [[3]]
#> [1] "c"
dt[, with_seatbelts({x[[ii]] <<- letter; NULL}), by=letter]
#> Warning in with_seatbelts({: Variable 'letter' used in i such at least one reference persists, data.data should make a copy of it now but instead is changing it by reference. This might cause some issues for you. Avoid them by using 'copy(letter)' instead. If R was compiled with reference counting turned off (the default for R<4.0) you might not see this warning). See https://github.com/Rdatatable/data.table/issues/507 for details.
#> Warning in with_seatbelts({: Variable 'letter' used in i such at least one reference persists, data.data should make a copy of it now but instead is changing it by reference. This might cause some issues for you. Avoid them by using 'copy(letter)' instead. If R was compiled with reference counting turned off (the default for R<4.0) you might not see this warning). See https://github.com/Rdatatable/data.table/issues/507 for details.
#> Empty data.table (0 rows and 1 cols): letter
x
#> [[1]]
#> [1] "c"
#>
#> [[2]]
#> [1] "c"
#>
#> [[3]]
#> [1] "c" (This does does a check for every group, doing it once after all groups should be enough) In R<4.4, checking if NAMED=2 would probably lead to too many false positives ... |
Submitted by: Matt Dowle; Assigned to: Nobody; R-Forge link
As raised here on SO. I had a look and made progress but couldn't for the life of me work out why the last group is reused.
The text was updated successfully, but these errors were encountered: