Skip to content

Commit

Permalink
Fix memory leak in fread (#4710)
Browse files Browse the repository at this point in the history
  • Loading branch information
jimhester authored Nov 15, 2022
1 parent dd2134e commit 5affced
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 4 deletions.
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -560,7 +560,9 @@
DT2 = do.call(data.table, list(DF)) # 3.07s before, 0.02s after
identical(DT1, DT2) # TRUE
```


55. `fread()` could leak memory, [#3292](https://github.com/Rdatatable/data.table/issues/3292). Thanks to @patrickhowerter for reporting, and Jim Hester for the fix. The fix requires R 3.4.0 or later. Loading `data.table` in earlier versions now warns that known problems exist, asks users to upgrade R, and warns that we intend to upgrade `data.table`'s dependency from 8-year-old R 3.1.0 (April 2014) to 5-year-old R 3.4.0 (April 2017).

## NOTES

1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
Expand Down
4 changes: 4 additions & 0 deletions R/onAttach.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@
else
packageStartupMessagef("This is %s. This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue.\n**********", Sys.info()["sysname"])
}
if (.Call(CbeforeR340)) {
# not base::getRversion()<"3.4.0" in case the user upgrades R but does not reinstall data.table; a reasonable mistake since data.table would seem to be the latest version
packageStartupMessagef("**********\nThis data.table installation was compiled for R < 3.4.0 (Apr 2017) and is known to leak memory. Please upgrade R and reinstall data.table to fix the leak. Maintaining and testing code branches to support very old versions increases development time so please do upgrade R. We intend to bump data.table's dependency from 8 year old R 3.1.0 (Apr 2014) to 5 year old R 3.4.0 (Apr 2017).\n**********")
}
}
}

Expand Down
8 changes: 8 additions & 0 deletions inst/tests/benchmark.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -175,3 +175,11 @@ t2 = system.time(DT2 <- do.call(data.table, list(DF))) # 3.07s before, 0.02s af
test(, identical(DT1, DT2))
test(, t2["elapsed"]/t1["elapsed"]<2)

# fread leak, #3292
dummy = rep("1\t2\t3\t4\t5", 10000000)
writeLines(dummy, "out.tsv")
start = gc()["Vcells",2]
for (i in 1:10) data.table::fread("out.tsv")
end = gc()["Vcells",2]
test(, end/start < 1.05)

1 change: 1 addition & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,7 @@ SEXP nqRecreateIndices(SEXP, SEXP, SEXP, SEXP, SEXP);
SEXP fsort(SEXP, SEXP);
SEXP inrange(SEXP, SEXP, SEXP, SEXP);
SEXP hasOpenMP(void);
SEXP beforeR340(void);
SEXP uniqueNlogical(SEXP, SEXP);
SEXP dllVersion(void);
SEXP initLastUpdated(SEXP);
Expand Down
10 changes: 7 additions & 3 deletions src/freadR.c
Original file line number Diff line number Diff line change
Expand Up @@ -523,9 +523,13 @@ void setFinalNrow(size_t nrow) {
if (length(DT)) {
if (nrow == dtnrows)
return;
for (int i=0; i<LENGTH(DT); i++) {
SETLENGTH(VECTOR_ELT(DT,i), nrow); // TODO: realloc
SET_TRUELENGTH(VECTOR_ELT(DT,i), nrow);
const int ncol=LENGTH(DT);
for (int i=0; i<ncol; i++) {
SETLENGTH(VECTOR_ELT(DT,i), nrow);
SET_TRUELENGTH(VECTOR_ELT(DT,i), dtnrows);
#if defined(R_VERSION) && R_VERSION>=R_Version(3,4,0)
SET_GROWABLE_BIT(VECTOR_ELT(DT,i)); // #3292
#endif
}
}
R_FlushConsole(); // # 2481. Just a convenient place; nothing per se to do with setFinalNrow()
Expand Down
11 changes: 11 additions & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ R_CallMethodDef callMethods[] = {
{"Cinrange", (DL_FUNC) &inrange, -1},
{"Cbetween", (DL_FUNC) &between, -1},
{"ChasOpenMP", (DL_FUNC) &hasOpenMP, -1},
{"CbeforeR340", (DL_FUNC) &beforeR340, -1},
{"CuniqueNlogical", (DL_FUNC) &uniqueNlogical, -1},
{"CfrollfunR", (DL_FUNC) &frollfunR, -1},
{"CdllVersion", (DL_FUNC) &dllVersion, -1},
Expand Down Expand Up @@ -330,6 +331,16 @@ SEXP hasOpenMP(void) {
}
// # nocov end

SEXP beforeR340(void) {
// used in onAttach.R for message about fread memory leak fix needing R 3.4.0
// at C level to catch if user upgrades R but does not reinstall data.table
#if defined(R_VERSION) && R_VERSION<R_Version(3,4,0)
return ScalarLogical(true);
#else
return ScalarLogical(false);
#endif
}

extern int *_Last_updated; // assign.c

SEXP initLastUpdated(SEXP var) {
Expand Down

0 comments on commit 5affced

Please sign in to comment.