Closed as not planned
Description
UPDATE: see #146 (comment) for latest update to the goals of this issue
What would you like to be added:
Official, tested support for MPI jobs. Some things we need to decide:
- Will the user or JobSet controller be responsible for creating the ssh config (Secrets)?
- Will the user or JobSet controller be responsible for setting up the environment variables configuring slots per host, etc?
These questions can also be addressed more broadly in the design for configuration defaulting for certain workload types (PyTorch job, TensorFlow job, MPI job, etc.)
Why is this needed:
MPI is a popular parallel computing paradigm with implementations such as OpenMPI, IntelMPI that is commonly used for HPC jobs, and can be modeled as JobSet (this was prototyped and confirmed during the JobSet research and prototyping phase).
Activity