-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding 1 column to data table nearly doubles peak memory used #1062
Comments
Hi Arun, can I check if you had a chance to look at this issue? While working on issue #1069, I ran the latest ver 1.9.5 in Linux Mint, but the issue persists. Thus, it is not a problem isolated to Windows only. |
I know where the issue is, just not sure how to fix it yet :-(. It's in if (.global$print != "" && address(x) == .global$print) {
SYS <- sys.calls()
if ((length(last(SYS)) >= 2L && typeof(last(SYS)[[2L]]) %chin%
c("list", "promise")) || (length(SYS) > 3L && SYS[[length(SYS) -
3L]][[1L]] == "knit_print.default")) {
.global$print = ""
return(invisible())
}
} The line |
I see, appreciate your reply! Would you be able to tag it with a label to keep this issue in-view? |
Arun, I just tested deleting a column using |
|
Understand. One can't rush inspiration. It's more important to have a good fix than to rush it. |
@NoviceProg this seems to have been fixed in R v3.2, IIUC, with this item from NEWS:
Yay! Testing the data from your SO post has no rise in memory usage. Could you please test (and close if solved?). Thanks. |
Sorry for the delay in replying, Arun. Was traveling a bit for work. Yes, I confirm that, after upgrading to R v3.2, the issue is fixed. Thanks! |
I have a wide CSV with thousands of columns that I
fread
into R to do further transformation. When I added an empty column using:=
filled withNA
s, R crashed with an "out-of-memory" error. I was initially perplexed as I had 16GB RAM and the data.table before adding the new column was only ~9GB in memory.To investigate further, I created a mock 2500 columns by 200000 row table. I noticed peak memory nearly doubles from 3.8GB to 7.6GB when a new column is added. Only upon running
gc()
did memory return to 3.8GB. Both ver 1.9.4 and the latest 1.9.5 exhibit this issue (please refer to the 2 printscreens attached).I raised the question in Stack Overflow (URL below) as I believe this should not be happening. After some discussion, Arun encouraged me to file this report.
My system: Intel i7-4700 with 4-core/8-thread; 16GB DDR3-12800 RAM; Windows 8.1 64-bit; 500GB 7200rpm HDD; 64-bit R; Data Table ver 1.9.4 and 1.9.5
The code I used to create the mock wide dataset is available at my SO question (nothing special, not repeated here for brevity).
Data package has been an invaluable tool in my R project and I would like to thank Matthew, Arun and other contributors for this remarkable package! I hope this issue report can be my little contribution.
https://stackoverflow.com/questions/28347305/r-why-adding-1-column-to-data-table-nearly-doubles-peak-memory-used
The text was updated successfully, but these errors were encountered: