Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.multicombine required for rbind & foreach reduce stage is generally quite slow #46

Open
locklin opened this issue Jul 7, 2024 · 0 comments

Comments

@locklin
Copy link

locklin commented Jul 7, 2024

In the documentation it indicates you don't need .multicombine=T when using foreach with .combine=rbind.

This is incorrect; trying to return an array without .multicombine=T produces an absurdly slow result.


registerDoMC(cores=8)

testFun  <- function(multicomb,n=64000) {
    out = foreach(com=1:n, .combine=rbind,.multicombine=multicomb) %dopar% {
        Sys.sleep(8/n)
        if(com==n) {
            print(paste("preparing to return last value at",strftime(Sys.time(),format="%H:%M:%S")))
        }
        return(rnorm(10))
    }
    print(paste("finished gathering my ",n,"arrays at",strftime(Sys.time(),format="%H:%M:%S"))) 
    nrow(out)
}

testFun(F)
[1] "preparing to return last value at 14:49:18"
[1] "finished gathering my  64000 arrays at 14:50:27"
[1] 64000

 testFun(T)
[1] "preparing to return last value at 14:47:10"
[1] "finished gathering my  64000 arrays at 14:47:14"
[1] 64000


Personally I think the result is bad regardless of .multicombine state; 4 seconds to stick 64000 rows together is absurd, even on a raspberry pi. But it gets horrendously bad without .multicombine -in fact for a similar problem (prop trading stuff instead of Sys.sleep) I clock 7 minutes to cons the 64000 rows into a report in the .multicombine=F situation. The actual task only takes 3 minutes. For .multicombine=T this task still takes 19 seconds to cons together the 64000 rows; acceptable for my uses but still nuts. It's a threadripper not a 6809.

FWIIW same thing happens when you ignore .combine and .multicombine and return it as a list. Are you guys doing some giant memory garbage collection before you return? if so that would make sense on fork based multicore doodads.

version
_
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 4
minor 4.1
year 2024
month 06
day 14
svn rev 86737
language R
version.string R version 4.4.1 (2024-06-14)
nickname Race for Your Life

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant