Specify temporary directory for job submission scripts #3

mglev1n · 2023-05-04T18:19:27Z

I'm working to create a minimal example LSF plugin for crew.cluster, building off of the current SGE implementation. One thing I've noticed in the current implementation is that the job submission scripts are written to a temporary directory:

crew.cluster/R/utils_names.R

Line 11 in 626647f

file.path(tempdir(), sprintf("%s-%s-%s.sh", prefix, launcher, worker))

I'd like to propose allowing the user to specify this directory, rather than making it temporary by default. At least in my current HPC environment, temporary directories are node specific, and are not necessarily accessible across machines/jobs. Following the example in https://github.com/wlandau/crew.cluster/blob/main/tests/sge/minimal.R, after controller$push(...), it seems like the following chain of events should occur:

A job submission script is written to a temporary directory, which when run will launch a worker
launch_worker submits a job, referencing the job submission script/parameters from above:

crew.cluster/R/crew_launcher_sge.R

Line 319 in 626647f

launch_worker = function(call, launcher, worker, instance) {
Once the worker job is running, it will accept commands

In my environment, this workflow creates a situation where the job submission script could be written to a temporary directory on Machine_A, but the launch_worker command is executed on Machine_B where the submission script is not visible. This effectively means that no workers are able to start. This could be alleviated if the user is allowed to specify the temporary directory, which could point toward a shared directory visible across all machines/nodes. The default could remain saving to a temporary directory, maintaining the current functionality.

Alternatively, allowing the user to specify arguments to the submission command here: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#LL338C4-L338C4 may work.

The text was updated successfully, but these errors were encountered:

wlandau · 2023-05-04T20:43:27Z

Thank you so much for offering to write an LSF launcher! I just implemented a SLURM launcher, and I condensed the common elements of SGE and SLURM to make it easier to write new cluster launchers. If you have any questions, please let me know.

I added a script_directory argument to provide the location of script paths. tools::R_user_dir(package = "crew.cluster", which = "cache") seems like a good place to use in your case.

wlandau closed this as completed May 4, 2023

wlandau mentioned this issue May 4, 2023

Support LSF #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify temporary directory for job submission scripts #3

Specify temporary directory for job submission scripts #3

mglev1n commented May 4, 2023 •

edited

Loading

wlandau commented May 4, 2023

Specify temporary directory for job submission scripts #3

Specify temporary directory for job submission scripts #3

Comments

mglev1n commented May 4, 2023 • edited Loading

wlandau commented May 4, 2023

mglev1n commented May 4, 2023 •

edited

Loading