Skip to content

Performance regression: 2.0.x branch degradation over 1.10 for MTLs #2644

Closed
@matcabral

Description

@matcabral

This is a place holder to fix the performance regressions seen on 2.0.x branch with regards to 1.10 that is impacting MTLs (tested with OFI and PSM2). The degradation is mostly impacting latency in small messages sizes, with some impact in bw.

Building with:

./configure CFLAGS=-O3 --prefix=<install path> --with-libfabric=no
--with-psm2=/usr --disable-oshmem --with-devel-headers --disable-debug
--disable-mem-profile --disable-mem-debug

The below tests assume same system setup, only changing OMPI 1.10 for 2.0.x

Two ranks on different nodes running osu_latency over PSM2.
1 -10%
2 -10%
4 -12%
8 -12%
16 -10%
32 -11%
64 -8%
128 -10%
256 -12%
512 -10%
1024 -12%
2048 -5%
4096 -18%
8192 0%
16384 -6%
32768 -4%
65536 -5%
131072 -3%
262144 -2%
524288 8%
1048576 0%
2097152 0%
4194304 18%

Two ranks on same node running osu_latency over PSM2.
1 -19%
2 -16%
4 -16%
8 -16%
16 -19%
32 -4%
64 -6%
128 -6%
256 -4%
512 -6%
1024 -14%
2048 -31%
4096 -1%
8192 -8%
16384 -24%
32768 -18%
65536 -12%
131072 -6%
262144 -8%
524288 -5%

Two ranks on different nodes running osu_bw over PSM2.
1 -7%
2 -9%
4 -7%
8 -7%
16 -5%
32 -4%
64 -6%
128 -2%
256 -5%
512 -7%
1024 -7%
2048 -5%
4096 -2%
8192 -2%
16384 -1%
32768 2%
65536 0%
131072 1%
262144 0%
524288 0%
1048576 0%
2097152 0%
4194304 0%

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions