-
Notifications
You must be signed in to change notification settings - Fork 10
Description
After a lot of troubleshooting I've submiited a related bug report/feature request here at batchtools.
Long story short, jobs submitted via future.batchtools are timing out in readLog, even though the jobs do exist. This is resolved by altering scheduler.latency & fs.latency.
I'm setting up my futures like this:
slurm<-future::tweak(batchtools_slurm,
template = 'batchtools.slurm.tmpl',
workers = 4,
resources = list(
walltime = 7200,
memory = 2048,
ncpus = 4,
ntasks = 4,
partition = 'ph8' # GCE 'n2d-highcpu-8'
)
future::plan(slurm)
(unless I've done something stupid...) for future.batchtools to work reliably in my environment I need to be able to set fs.latency & scheduler.latency from future::tweak, or somewhere else. As far as I can see, these don't currently get passed through to batchtools::makeClusterFunctionsSlurm.
I'm currently getting around this problem by overwriting the default batchtools::makeClusterFunctionsSlurm with assignInNamespace. Setting 70 seconds for scheduler.latency and 10 seconds for fs.latency solves my problem and makes future.batchtools run jobs reliably desipite the provisioning of machines. Unfortunately this increases the delay for batchtools to recognise that the job has finished. No big deal for long-running jobs, but I've made a feature request at batchtools for the scheduler.latency option to be split, with a new one to cover the initial sleep.
Thanks