Skip to content

Support for MPI jobs (JobSet extensions) #146

Closed as not planned
Closed as not planned
@danielvegamyhre

Description

UPDATE: see #146 (comment) for latest update to the goals of this issue

What would you like to be added:
Official, tested support for MPI jobs. Some things we need to decide:

  • Will the user or JobSet controller be responsible for creating the ssh config (Secrets)?
  • Will the user or JobSet controller be responsible for setting up the environment variables configuring slots per host, etc?

These questions can also be addressed more broadly in the design for configuration defaulting for certain workload types (PyTorch job, TensorFlow job, MPI job, etc.)

Why is this needed:
MPI is a popular parallel computing paradigm with implementations such as OpenMPI, IntelMPI that is commonly used for HPC jobs, and can be modeled as JobSet (this was prototyped and confirmed during the JobSet research and prototyping phase).

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions