Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in fread #4710

Merged
merged 6 commits into from
Nov 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -560,7 +560,9 @@
DT2 = do.call(data.table, list(DF)) # 3.07s before, 0.02s after
identical(DT1, DT2) # TRUE
```


55. `fread()` could leak memory, [#3292](https://github.com/Rdatatable/data.table/issues/3292). Thanks to @patrickhowerter for reporting, and Jim Hester for the fix. The fix requires R 3.4.0 or later. Loading `data.table` in earlier versions now warns that known problems exist, asks users to upgrade R, and warns that we intend to upgrade `data.table`'s dependency from 8-year-old R 3.1.0 (April 2014) to 5-year-old R 3.4.0 (April 2017).

## NOTES

1. New feature 29 in v1.12.4 (Oct 2019) introduced zero-copy coercion. Our thinking is that requiring you to get the type right in the case of `0` (type double) vs `0L` (type integer) is too inconvenient for you the user. So such coercions happen in `data.table` automatically without warning. Thanks to zero-copy coercion there is no speed penalty, even when calling `set()` many times in a loop, so there's no speed penalty to warn you about either. However, we believe that assigning a character value such as `"2"` into an integer column is more likely to be a user mistake that you would like to be warned about. The type difference (character vs integer) may be the only clue that you have selected the wrong column, or typed the wrong variable to be assigned to that column. For this reason we view character to numeric-like coercion differently and will warn about it. If it is correct, then the warning is intended to nudge you to wrap the RHS with `as.<type>()` so that it is clear to readers of your code that a coercion from character to that type is intended. For example :
Expand Down
4 changes: 4 additions & 0 deletions R/onAttach.R
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@
else
packageStartupMessagef("This is %s. This warning should not normally occur on Windows or Linux where OpenMP is turned on by data.table's configure script by passing -fopenmp to the compiler. If you see this warning on Windows or Linux, please file a GitHub issue.\n**********", Sys.info()["sysname"])
}
if (.Call(CbeforeR340)) {
# not base::getRversion()<"3.4.0" in case the user upgrades R but does not reinstall data.table; a reasonable mistake since data.table would seem to be the latest version
packageStartupMessagef("**********\nThis data.table installation was compiled for R < 3.4.0 (Apr 2017) and is known to leak memory. Please upgrade R and reinstall data.table to fix the leak. Maintaining and testing code branches to support very old versions increases development time so please do upgrade R. We intend to bump data.table's dependency from 8 year old R 3.1.0 (Apr 2014) to 5 year old R 3.4.0 (Apr 2017).\n**********")
}
}
}

Expand Down
8 changes: 8 additions & 0 deletions inst/tests/benchmark.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -175,3 +175,11 @@ t2 = system.time(DT2 <- do.call(data.table, list(DF))) # 3.07s before, 0.02s af
test(, identical(DT1, DT2))
test(, t2["elapsed"]/t1["elapsed"]<2)

# fread leak, #3292
dummy = rep("1\t2\t3\t4\t5", 10000000)
writeLines(dummy, "out.tsv")
start = gc()["Vcells",2]
for (i in 1:10) data.table::fread("out.tsv")
end = gc()["Vcells",2]
test(, end/start < 1.05)

1 change: 1 addition & 0 deletions src/data.table.h
Original file line number Diff line number Diff line change
Expand Up @@ -323,6 +323,7 @@ SEXP nqRecreateIndices(SEXP, SEXP, SEXP, SEXP, SEXP);
SEXP fsort(SEXP, SEXP);
SEXP inrange(SEXP, SEXP, SEXP, SEXP);
SEXP hasOpenMP(void);
SEXP beforeR340(void);
SEXP uniqueNlogical(SEXP, SEXP);
SEXP dllVersion(void);
SEXP initLastUpdated(SEXP);
Expand Down
10 changes: 7 additions & 3 deletions src/freadR.c
Original file line number Diff line number Diff line change
Expand Up @@ -523,9 +523,13 @@ void setFinalNrow(size_t nrow) {
if (length(DT)) {
if (nrow == dtnrows)
return;
for (int i=0; i<LENGTH(DT); i++) {
SETLENGTH(VECTOR_ELT(DT,i), nrow); // TODO: realloc
SET_TRUELENGTH(VECTOR_ELT(DT,i), nrow);
const int ncol=LENGTH(DT);
for (int i=0; i<ncol; i++) {
SETLENGTH(VECTOR_ELT(DT,i), nrow);
SET_TRUELENGTH(VECTOR_ELT(DT,i), dtnrows);
#if defined(R_VERSION) && R_VERSION>=R_Version(3,4,0)
SET_GROWABLE_BIT(VECTOR_ELT(DT,i)); // #3292
#endif
}
}
R_FlushConsole(); // # 2481. Just a convenient place; nothing per se to do with setFinalNrow()
Expand Down
11 changes: 11 additions & 0 deletions src/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,7 @@ R_CallMethodDef callMethods[] = {
{"Cinrange", (DL_FUNC) &inrange, -1},
{"Cbetween", (DL_FUNC) &between, -1},
{"ChasOpenMP", (DL_FUNC) &hasOpenMP, -1},
{"CbeforeR340", (DL_FUNC) &beforeR340, -1},
{"CuniqueNlogical", (DL_FUNC) &uniqueNlogical, -1},
{"CfrollfunR", (DL_FUNC) &frollfunR, -1},
{"CdllVersion", (DL_FUNC) &dllVersion, -1},
Expand Down Expand Up @@ -330,6 +331,16 @@ SEXP hasOpenMP(void) {
}
// # nocov end

SEXP beforeR340(void) {
// used in onAttach.R for message about fread memory leak fix needing R 3.4.0
// at C level to catch if user upgrades R but does not reinstall data.table
#if defined(R_VERSION) && R_VERSION<R_Version(3,4,0)
return ScalarLogical(true);
#else
return ScalarLogical(false);
#endif
}

extern int *_Last_updated; // assign.c

SEXP initLastUpdated(SEXP var) {
Expand Down