Replies: 1 comment
-
|
Hi. You can use the library(future.apply)
plan(multisession, workers = 8)
job_list <- sort(1:(8*2), decreasing = TRUE)
order <- matrix(job_list, ncol = 8, byrow = TRUE)
order <- match(order, job_list)
print(order)
#> [1] 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16
results <- future_lapply(
job_list,
function(job) {
T1 <- Sys.time()
Sys.sleep(job)
tibble::tibble(job = job, T1 = T1, jobID = Sys.getpid())
},
future.scheduling = structure(TRUE, ordering = order)
) |> dplyr::bind_rows()
print(results)gives This processes the tasks in eight chunks. Here, last task completes at ~16+8=24s. Now, if you use: order <- order(job_list, decreasing = TRUE) ## 1-16 since already sorted
future.chunk.size = structure(1L, ordering = order)the tasks are processed one-by-one following the general order according to Here, last tasks complete at ~18s. I think this is what you're after if you want to maximize worker utilization and finish as soon as possible without having to worry too much about the perfect optimization. FWIW, there are probably more optimal ways to schedule tasks when we can estimate the processing time for each task. For example, pairing up tasks (16,1), (15,2), (14,3), ..., (9,8) will result in them finish about the same time. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I want to use the
future.lapplyfunction to run many jobs (e.g. 1000) and I know that these jobs vary in complexity and time-consumed. I sorted them in an estimated descending order and need to forcefuture.lapplyto distribute the most complex jobs to cores, then the next complex jobs, etc. This will balance the processing time. I need to implement something like:I use something similar to this
This forces the most complex 8 jobs to be executed first, then the next 8, etc.
I expect Process ID 26080 to process jobs 16 and 9, not 16 and 1. How can I ensure that the jobs are distributed in the order they are processed? Using
future.scheduling = Infinstead offuture.chunk.size = 1leads to the same results.A related question is that I need to allow jobs to start in the desired order (i.e. most complex jobs first) but distribute jobs to cores as soon as they are finished with previous job. When I monitor the current implementation, I find that some cores may become idle by the end of the processing, while many left jobs are waiting to be done on only a couple of cores, which influence the total time consumed by the job. Is this achievable without affecting the overall performance of the task?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions