Benchmarking mirai parallel clusters #102

bing-nu · 2024-03-23T02:00:50Z

bing-nu
Mar 23, 2024

Hello, my apologies if this is not the right place to post.

I recently came across your brilliant post and can confirm in my benchmarking on several x86 and ARM architectures with various core counts that your findings are consistent. I have been looking for a straightforward, low cost abstraction for multithreading massively parallel functions that operate on data frames and data tables.

I proceeded to try this approach on a slightly different scenario only to find that there is a substantial difference in performance where parallel + mirai backend appears to be substantially slower than the base R operation or purrr::map(). I am not sure what accounts for the difference. Here is a reproducible example, the function is merely intended to accept two arguments and necessitating a different parallel call than parLapply().

library(parallel)
library(mirai)
library(purrr)

set.seed(123)
cl <- parallel::makeCluster(4)  #Edited from original code submission
mirai <- mirai::make_cluster(4)

df <- data.frame(A = sample(1:100, 1000, replace = TRUE), B = sample(1:100, 1000, replace = TRUE))

sq_diff <- function(a, b) {
  (a - b)^2/2
}

res <- microbenchmark::microbenchmark(
  mapply(sq_diff, df$A, df$B),
  clusterMap(cl, sq_diff, df$A, df$B, SIMPLIFY = TRUE),
  clusterMap(mirai, sq_diff, df$A, df$B, SIMPLIFY = TRUE),
  unlist(map2(df$A, df$B, sq_diff))
)

ggplot2::autoplot(res) + ggplot2::theme_minimal()

On different x86-64 computers, ranging from 2-24 nodes, I am seeing approximately 30-50x slower performance for clusterMap with mirai or standard backend compared to base R mapply(). purrr::map2() is approximately the same performance as mapply(). clusterMap with standard parallel backend is essentially the same performance as mirai backend.

Am I simply using the wrong call in mapply or clusterMap? Perhaps there is a more efficient construction to leverage mirai?

R 4.3.3
mirai 0.12.1
nanonext 0.13.2
parallel 4.3.3
purrr 1.0.2

shikokuchuo · 2024-03-23T12:42:49Z

shikokuchuo
Mar 23, 2024
Maintainer

If you are using the code you posted above then this is because the cl and mirai cluster objects are both mirai clusters. You would need to use parallel::makeCluster() to create base R clusters.

1 reply

bing-nu Mar 23, 2024
Author

Oops, sorry about that! Using cl <- parallel::makeCluster(4) I re-ran my benchmark.

I also re-ran a slight modification to your original benchmark which confirms your original result and also shows that mapply() does not have any particular optimization over lapply().

res <- microbenchmark::microbenchmark(
  unlist(parLapply(base, x, rpois, n = 1)),
  unlist(lapply(x, rpois, n = 1)),
  unlist(parLapply(mirai, x, rpois, n = 1)),
  mapply(rpois, x, n = 1)
)

shikokuchuo · 2024-04-15T08:43:05Z

shikokuchuo
Apr 15, 2024
Maintainer

Am I simply using the wrong call in mapply or clusterMap?

Which function to use is about the general API of the parallel package in base R, and it is not for mirai or any individual cluster backend to make any recommendations over.

To be clear, you are finding that the performance of the cluster backends in base R and in mirai to be similar for the function clusterMap(). All this suggests is that some cluster functions have overhead independent of backend (base or mirai). Why that is, or if it can be improved would be a matter of reading the R source and contributing to base R rather than something for mirai to solve.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking mirai parallel clusters #102

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Benchmarking mirai parallel clusters #102

bing-nu Mar 23, 2024

Replies: 2 comments · 1 reply

shikokuchuo Mar 23, 2024 Maintainer

bing-nu Mar 23, 2024 Author

shikokuchuo Apr 15, 2024 Maintainer

bing-nu
Mar 23, 2024

Replies: 2 comments 1 reply

shikokuchuo
Mar 23, 2024
Maintainer

bing-nu Mar 23, 2024
Author

shikokuchuo
Apr 15, 2024
Maintainer