Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak in fread #4710

Merged
merged 6 commits into from
Nov 15, 2022
Merged

Conversation

jimhester
Copy link
Contributor

Fixes #3292

@jimhester
Copy link
Contributor Author

Before this PR you can see the memory leak with the increase in Vcells

dummy <- rep("1\t2\t3\t4\t5", 10000000)
writeLines(dummy, "out.tsv")

invisible(data.table::fread("out.tsv"))
for (i in 1:10) {
  data.table::fread("out.tsv")
  print(gc())
}
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539635  28.9     918942  49.1         NA   850004  45.4
#> Vcells 16041623 122.4   41582216 317.3      32768 41041748 313.2
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539797  28.9     918942  49.1         NA   850004  45.4
#> Vcells 18540198 141.5   54339092 414.6      32768 43540348 332.2
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539799  28.9     918942  49.1         NA   850004  45.4
#> Vcells 21040504 160.6   54339092 414.6      32768 46043083 351.3
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539801  28.9     918942  49.1         NA   850004  45.4
#> Vcells 23540814 179.7   54339092 414.6      32768 48543401 370.4
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539803  28.9     918942  49.1         NA   850004  45.4
#> Vcells 26041116 198.7   54339092 414.6      32768 51043727 389.5
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539805  28.9     918942  49.1         NA   850004  45.4
#> Vcells 28541418 217.8   54339092 414.6      32768 53544029 408.6
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539807  28.9     918942  49.1         NA   850004  45.4
#> Vcells 31041722 236.9   69340920 529.1      32768 56041872 427.6
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539809  28.9     918942  49.1         NA   850004  45.4
#> Vcells 33542027 256.0   69340920 529.1      32768 58544647 446.7
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539811  28.9     918942  49.1         NA   850004  45.4
#> Vcells 36042332 275.0   69340920 529.1      32768 61044961 465.8
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539813  28.9     918942  49.1         NA   850004  45.4
#> Vcells 38542637 294.1   69340920 529.1      32768 63545275 484.9
print(gc())
#>            used  (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539779  28.9     918942  49.1         NA   850004  45.4
#> Vcells 38540380 294.1   69340920 529.1      32768 63545275 484.9

Created on 2020-09-18 by the reprex package (v0.3.0)

Afterwards the memory leak is gone and the Vcells are basically constant.

dummy <- rep("1\t2\t3\t4\t5", 10000000)
writeLines(dummy, "out.tsv")

invisible(data.table::fread("out.tsv"))
for (i in 1:10) {
  data.table::fread("out.tsv")
  print(gc())
}
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539635 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11041081 84.3   52078987 397.4      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539797 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039362 84.3   41663190 317.9      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539799 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039382 84.3   52228899 398.5      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539801 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039406 84.3   41783120 318.8      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539803 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039422 84.3   52228944 398.5      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539805 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039438 84.3   41783156 318.8      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539807 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039456 84.3   52228984 398.5      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539809 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039475 84.3   41783188 318.8      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539811 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039494 84.3   52229028 398.5      32768 63993871 488.3
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539813 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11039513 84.3   50203867 383.1      32768 63993871 488.3
print(gc())
#>            used (Mb) gc trigger  (Mb) limit (Mb) max used  (Mb)
#> Ncells   539779 28.9     918942  49.1         NA   850004  45.4
#> Vcells 11037405 84.3   40163094 306.5      32768 63993871 488.3

Created on 2020-09-18 by the reprex package (v0.3.0)

This change does potentially perform a reallocation, whereas the previous code does not, but I am not sure there is any way around it.

NEWS.md Outdated Show resolved Hide resolved
@jimhester
Copy link
Contributor Author

An alternative implementation that also fixes the memory leak and avoids the reallocation would be to set the growable bit and the true length to be the allocated length. e.g. this diff from the current PR.

diff --git a/src/freadR.c b/src/freadR.c
index 8293586b..de260369 100644
--- a/src/freadR.c
+++ b/src/freadR.c
@@ -527,7 +527,9 @@ void setFinalNrow(size_t nrow) {
     if (nrow == dtnrows)
       return;
     for (int i=0; i<LENGTH(DT); i++) {
-      SET_VECTOR_ELT(DT, i, Rf_lengthgets(VECTOR_ELT(DT, i), nrow));
+      SETLENGTH(VECTOR_ELT(DT,i), nrow);
+      SET_TRUELENGTH(VECTOR_ELT(DT,i), dtnrow);
+      SET_GROWABLE_BIT(VECTOR_ELT(DT,i));
     }
   }
   R_FlushConsole(); // # 2481. Just a convenient place; nothing per se to do with setFinalNrow()

SET_GROWABLE_BIT is only available in R 3.4+, so this would need to be conditionally disabled for earlier R versions.

@mattdowle mattdowle added this to the 1.14.5 milestone Nov 15, 2022
@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Merging #4710 (44f5bbb) into master (dd2134e) will decrease coverage by 0.01%.
The diff coverage is 71.42%.

❗ Current head 44f5bbb differs from pull request most recent head 84afbaf. Consider uploading reports for the commit 84afbaf to get more accurate results

@@            Coverage Diff             @@
##           master    #4710      +/-   ##
==========================================
- Coverage   98.31%   98.30%   -0.02%     
==========================================
  Files          80       80              
  Lines       14791    14795       +4     
==========================================
+ Hits        14542    14544       +2     
- Misses        249      251       +2     
Impacted Files Coverage Δ
src/init.c 97.95% <0.00%> (-2.05%) ⬇️
src/freadR.c 100.00% <100.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@mattdowle mattdowle merged commit 5affced into Rdatatable:master Nov 15, 2022
mattdowle added a commit that referenced this pull request Nov 15, 2022
…ble.h thanks Jan, plotting in dev memtest need not require imports of standard functions (strange)
@mattdowle mattdowle modified the milestones: 1.14.7, 1.14.6 Nov 16, 2022
mattdowle pushed a commit that referenced this pull request Nov 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fread: memory not being freed
4 participants