You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working to create a minimal example LSF plugin for crew.cluster, building off of the current SGE implementation. One thing I've noticed in the current implementation is that the job submission scripts are written to a temporary directory:
I'd like to propose allowing the user to specify this directory, rather than making it temporary by default. At least in my current HPC environment, temporary directories are node specific, and are not necessarily accessible across machines/jobs. Following the example in https://github.com/wlandau/crew.cluster/blob/main/tests/sge/minimal.R, after controller$push(...), it seems like the following chain of events should occur:
A job submission script is written to a temporary directory, which when run will launch a worker
launch_worker submits a job, referencing the job submission script/parameters from above:
Once the worker job is running, it will accept commands
In my environment, this workflow creates a situation where the job submission script could be written to a temporary directory on Machine_A, but the launch_worker command is executed on Machine_B where the submission script is not visible. This effectively means that no workers are able to start. This could be alleviated if the user is allowed to specify the temporary directory, which could point toward a shared directory visible across all machines/nodes. The default could remain saving to a temporary directory, maintaining the current functionality.
Thank you so much for offering to write an LSF launcher! I just implemented a SLURM launcher, and I condensed the common elements of SGE and SLURM to make it easier to write new cluster launchers. If you have any questions, please let me know.
I added a script_directory argument to provide the location of script paths. tools::R_user_dir(package = "crew.cluster", which = "cache") seems like a good place to use in your case.
I'm working to create a minimal example LSF plugin for
crew.cluster
, building off of the current SGE implementation. One thing I've noticed in the current implementation is that the job submission scripts are written to a temporary directory:crew.cluster/R/utils_names.R
Line 11 in 626647f
I'd like to propose allowing the user to specify this directory, rather than making it temporary by default. At least in my current HPC environment, temporary directories are node specific, and are not necessarily accessible across machines/jobs. Following the example in https://github.com/wlandau/crew.cluster/blob/main/tests/sge/minimal.R, after
controller$push(...)
, it seems like the following chain of events should occur:launch_worker
submits a job, referencing the job submission script/parameters from above:crew.cluster/R/crew_launcher_sge.R
Line 319 in 626647f
In my environment, this workflow creates a situation where the job submission script could be written to a temporary directory on
Machine_A
, but thelaunch_worker
command is executed onMachine_B
where the submission script is not visible. This effectively means that no workers are able to start. This could be alleviated if the user is allowed to specify the temporary directory, which could point toward a shared directory visible across all machines/nodes. The default could remain saving to a temporary directory, maintaining the current functionality.Alternatively, allowing the user to specify arguments to the submission command here: https://github.com/wlandau/crew.cluster/blob/626647fd0a4d26017a7f94533cf0abd370ff3c8b/R/crew_launcher_sge.R#LL338C4-L338C4 may work.
The text was updated successfully, but these errors were encountered: