Open
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
4.1.1
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from source
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
Please describe the system on which you are running
- Operating system/version: CentOS Linux 7.9
- Computer hardware: Intel(R) Xeon(R) CPU
- Network type: 50 GigE
Details of the problem
I am working to tune Open MPI on a new system type. By default coll/tuned is being selected and is giving so-so performance:
mpirun --mca btl_vader_single_copy_mechanism none --mca coll_base_verbose 0 --hostfile /shared/hostfile.ompi -n 128 -N 16 --bind-to core ./osu_allreduce
App launch reported: 9 (out of 9) daemons - 112 (out of 128) procs
# OSU MPI Allreduce Latency Test v5.7.1
# Size Avg Latency(us)
4 599.60
8 321.18
16 481.45
32 483.63
64 483.59
128 567.62
256 472.35
512 431.96
1024 609.19
2048 288.70
4096 355.52
8192 425.21
16384 546.61
32768 739.76
65536 1501.53
131072 2027.41
262144 1015.34
524288 1328.23
1048576 2101.48
The large messages look ok but small messages are not great.
When forcing coll/han things look way better for small messages at a huge cost to the large message performance:
mpirun --mca btl_vader_single_copy_mechanism none --mca coll_base_verbose 0 --hostfile /shared/hostfile.ompi -n 128 -N 16 --bind-to core --mca coll_han_priority 100 ./osu_allreduce
App launch reported: 9 (out of 9) daemons - 112 (out of 128) procs
# OSU MPI Allreduce Latency Test v5.7.1
# Size Avg Latency(us)
4 111.77
8 112.46
16 111.98
32 233.86
64 198.94
128 321.43
256 286.42
512 212.69
1024 305.23
2048 257.34
4096 332.50
8192 317.34
16384 359.07
32768 432.56
65536 729.18
131072 1102.87
262144 1801.27
524288 3301.01
1048576 6245.48
Is this expected? Another MPI on the system is getting 74us for the small messages (below 1k) and 1400us for 1MB messages.