-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak in fread #4710
Fix memory leak in fread #4710
Conversation
Before this PR you can see the memory leak with the increase in Vcells dummy <- rep("1\t2\t3\t4\t5", 10000000)
writeLines(dummy, "out.tsv")
invisible(data.table::fread("out.tsv"))
for (i in 1:10) {
data.table::fread("out.tsv")
print(gc())
}
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539635 28.9 918942 49.1 NA 850004 45.4
#> Vcells 16041623 122.4 41582216 317.3 32768 41041748 313.2
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539797 28.9 918942 49.1 NA 850004 45.4
#> Vcells 18540198 141.5 54339092 414.6 32768 43540348 332.2
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539799 28.9 918942 49.1 NA 850004 45.4
#> Vcells 21040504 160.6 54339092 414.6 32768 46043083 351.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539801 28.9 918942 49.1 NA 850004 45.4
#> Vcells 23540814 179.7 54339092 414.6 32768 48543401 370.4
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539803 28.9 918942 49.1 NA 850004 45.4
#> Vcells 26041116 198.7 54339092 414.6 32768 51043727 389.5
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539805 28.9 918942 49.1 NA 850004 45.4
#> Vcells 28541418 217.8 54339092 414.6 32768 53544029 408.6
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539807 28.9 918942 49.1 NA 850004 45.4
#> Vcells 31041722 236.9 69340920 529.1 32768 56041872 427.6
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539809 28.9 918942 49.1 NA 850004 45.4
#> Vcells 33542027 256.0 69340920 529.1 32768 58544647 446.7
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539811 28.9 918942 49.1 NA 850004 45.4
#> Vcells 36042332 275.0 69340920 529.1 32768 61044961 465.8
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539813 28.9 918942 49.1 NA 850004 45.4
#> Vcells 38542637 294.1 69340920 529.1 32768 63545275 484.9
print(gc())
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539779 28.9 918942 49.1 NA 850004 45.4
#> Vcells 38540380 294.1 69340920 529.1 32768 63545275 484.9 Created on 2020-09-18 by the reprex package (v0.3.0) Afterwards the memory leak is gone and the Vcells are basically constant. dummy <- rep("1\t2\t3\t4\t5", 10000000)
writeLines(dummy, "out.tsv")
invisible(data.table::fread("out.tsv"))
for (i in 1:10) {
data.table::fread("out.tsv")
print(gc())
}
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539635 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11041081 84.3 52078987 397.4 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539797 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039362 84.3 41663190 317.9 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539799 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039382 84.3 52228899 398.5 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539801 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039406 84.3 41783120 318.8 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539803 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039422 84.3 52228944 398.5 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539805 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039438 84.3 41783156 318.8 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539807 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039456 84.3 52228984 398.5 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539809 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039475 84.3 41783188 318.8 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539811 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039494 84.3 52229028 398.5 32768 63993871 488.3
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539813 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11039513 84.3 50203867 383.1 32768 63993871 488.3
print(gc())
#> used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
#> Ncells 539779 28.9 918942 49.1 NA 850004 45.4
#> Vcells 11037405 84.3 40163094 306.5 32768 63993871 488.3 Created on 2020-09-18 by the reprex package (v0.3.0) This change does potentially perform a reallocation, whereas the previous code does not, but I am not sure there is any way around it. |
An alternative implementation that also fixes the memory leak and avoids the reallocation would be to set the growable bit and the true length to be the allocated length. e.g. this diff from the current PR. diff --git a/src/freadR.c b/src/freadR.c
index 8293586b..de260369 100644
--- a/src/freadR.c
+++ b/src/freadR.c
@@ -527,7 +527,9 @@ void setFinalNrow(size_t nrow) {
if (nrow == dtnrows)
return;
for (int i=0; i<LENGTH(DT); i++) {
- SET_VECTOR_ELT(DT, i, Rf_lengthgets(VECTOR_ELT(DT, i), nrow));
+ SETLENGTH(VECTOR_ELT(DT,i), nrow);
+ SET_TRUELENGTH(VECTOR_ELT(DT,i), dtnrow);
+ SET_GROWABLE_BIT(VECTOR_ELT(DT,i));
}
}
R_FlushConsole(); // # 2481. Just a convenient place; nothing per se to do with setFinalNrow()
|
…sage, and extended news item
Codecov Report
@@ Coverage Diff @@
## master #4710 +/- ##
==========================================
- Coverage 98.31% 98.30% -0.02%
==========================================
Files 80 80
Lines 14791 14795 +4
==========================================
+ Hits 14542 14544 +2
- Misses 249 251 +2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
…ble.h thanks Jan, plotting in dev memtest need not require imports of standard functions (strange)
Fixes #3292