Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify temporary directory for job submission scripts #3

Closed
mglev1n opened this issue May 4, 2023 · 1 comment
Closed

Specify temporary directory for job submission scripts #3

mglev1n opened this issue May 4, 2023 · 1 comment

Comments

@mglev1n
Copy link
Contributor

mglev1n commented May 4, 2023

I'm working to create a minimal example LSF plugin for crew.cluster, building off of the current SGE implementation. One thing I've noticed in the current implementation is that the job submission scripts are written to a temporary directory:

file.path(tempdir(), sprintf("%s-%s-%s.sh", prefix, launcher, worker))

I'd like to propose allowing the user to specify this directory, rather than making it temporary by default. At least in my current HPC environment, temporary directories are node specific, and are not necessarily accessible across machines/jobs. Following the example in https://github.com/wlandau/crew.cluster/blob/main/tests/sge/minimal.R, after controller$push(...), it seems like the following chain of events should occur:

  1. A job submission script is written to a temporary directory, which when run will launch a worker
  2. launch_worker submits a job, referencing the job submission script/parameters from above:
    launch_worker = function(call, launcher, worker, instance) {
  3. Once the worker job is running, it will accept commands

In my environment, this workflow creates a situation where the job submission script could be written to a temporary directory on Machine_A, but the launch_worker command is executed on Machine_B where the submission script is not visible. This effectively means that no workers are able to start. This could be alleviated if the user is allowed to specify the temporary directory, which could point toward a shared directory visible across all machines/nodes. The default could remain saving to a temporary directory, maintaining the current functionality.

Alternatively, allowing the user to specify arguments to the submission command here: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#LL338C4-L338C4 may work.

@wlandau
Copy link
Owner

wlandau commented May 4, 2023

Thank you so much for offering to write an LSF launcher! I just implemented a SLURM launcher, and I condensed the common elements of SGE and SLURM to make it easier to write new cluster launchers. If you have any questions, please let me know.

I added a script_directory argument to provide the location of script paths. tools::R_user_dir(package = "crew.cluster", which = "cache") seems like a good place to use in your case.

@wlandau wlandau closed this as completed May 4, 2023
@wlandau wlandau mentioned this issue May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants