Open
Description
We are building supercomputing infra for an internal GPU cluster up to thousands of expensive GPUs.
We are looking to adopt mpi-operator, or slurm.
slurm is widely adopted in large-scale hpc computing, so its scalability is well tested.
Is there cases of mpi-operator's benchmark results on > 3000 gpus clsuter?
Metadata
Metadata
Assignees
Labels
No labels