Skip to content

Slurm readLog() Error - Option to change fs.latency & scheduler.latency from batchtools_slurm or future::tweak #73

@stuvet

Description

@stuvet

After a lot of troubleshooting I've submiited a related bug report/feature request here at batchtools.

Long story short, jobs submitted via future.batchtools are timing out in readLog, even though the jobs do exist. This is resolved by altering scheduler.latency & fs.latency.

I'm setting up my futures like this:

slurm<-future::tweak(batchtools_slurm, 
                                   template = 'batchtools.slurm.tmpl', 
                                    workers = 4,
                                    resources = list(
                                       walltime = 7200,
                                       memory = 2048,
                                       ncpus = 4,
                                       ntasks = 4,
                                       partition = 'ph8' # GCE 'n2d-highcpu-8'
                           )
future::plan(slurm)

(unless I've done something stupid...) for future.batchtools to work reliably in my environment I need to be able to set fs.latency & scheduler.latency from future::tweak, or somewhere else. As far as I can see, these don't currently get passed through to batchtools::makeClusterFunctionsSlurm.

I'm currently getting around this problem by overwriting the default batchtools::makeClusterFunctionsSlurm with assignInNamespace. Setting 70 seconds for scheduler.latency and 10 seconds for fs.latency solves my problem and makes future.batchtools run jobs reliably desipite the provisioning of machines. Unfortunately this increases the delay for batchtools to recognise that the job has finished. No big deal for long-running jobs, but I've made a feature request at batchtools for the scheduler.latency option to be split, with a new one to cover the initial sleep.

Thanks

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions