Replies: 2 comments 1 reply
-
If you are using the code you posted above then this is because the cl and mirai cluster objects are both mirai clusters. You would need to use parallel::makeCluster() to create base R clusters. |
Beta Was this translation helpful? Give feedback.
-
Which function to use is about the general API of the To be clear, you are finding that the performance of the cluster backends in base R and in |
Beta Was this translation helpful? Give feedback.
-
Hello, my apologies if this is not the right place to post.
I recently came across your brilliant post and can confirm in my benchmarking on several x86 and ARM architectures with various core counts that your findings are consistent. I have been looking for a straightforward, low cost abstraction for multithreading massively parallel functions that operate on data frames and data tables.
I proceeded to try this approach on a slightly different scenario only to find that there is a substantial difference in performance where parallel + mirai backend appears to be substantially slower than the base R operation or purrr::map(). I am not sure what accounts for the difference. Here is a reproducible example, the function is merely intended to accept two arguments and necessitating a different parallel call than parLapply().
On different x86-64 computers, ranging from 2-24 nodes, I am seeing approximately 30-50x slower performance for clusterMap with mirai or standard backend compared to base R mapply(). purrr::map2() is approximately the same performance as mapply(). clusterMap with standard parallel backend is essentially the same performance as mirai backend.
Am I simply using the wrong call in mapply or clusterMap? Perhaps there is a more efficient construction to leverage mirai?
R 4.3.3
mirai 0.12.1
nanonext 0.13.2
parallel 4.3.3
purrr 1.0.2
Beta Was this translation helpful? Give feedback.
All reactions