Skip to content

Problems when running examples hello_c #11063

Open
@shiwch

Description

@shiwch

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

v4.1.2

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from a source/distribution tarball

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

[nscc-gz@centos203 examples]$ lscpu 
Architecture:          aarch64
Byte Order:            Little Endian
CPU(s):                128
On-line CPU(s) list:   0-127
Thread(s) per core:    1
Core(s) per socket:    64
Socket(s):             2
NUMA node(s):          4
Model:                 0
BogoMIPS:              200.00
L1d cache:             64K
L1i cache:             64K
L2 cache:              512K
L3 cache:              65536K
NUMA node0 CPU(s):     0-31
NUMA node1 CPU(s):     32-63
NUMA node2 CPU(s):     64-95
NUMA node3 CPU(s):     96-127
Flags:                 fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop
  • Network type:

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

Hi,When I running the hello_c,I get the following output

[nscc-gz@centos203 examples]$ mpirun -np 4  --mca orte_base_help_aggregate 0
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           centos203
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           centos203
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           centos203
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           centos203
  Local device:         mlx5_0
  Local port:           1
  CPCs attempted:       rdmacm, udcm
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)

and the ibstat output

[nscc-gz@centos203 examples]$ ibstat
CA 'mlx5_0'
        CA type: MT4117
        Number of ports: 1
        Firmware version: 14.20.1820
        Hardware version: 0
        Node GUID: 
        System image GUID: 
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 25
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 
                Port GUID: 
                Link layer: Ethernet
CA 'mlx5_1'
        CA type: MT4117
        Number of ports: 1
        Firmware version: 14.20.1820
        Hardware version: 0
        Node GUID: 
        System image GUID: 
        Port 1:
                State: Down
                Physical state: Disabled
                Rate: 40
                Base lid: 0
                LMC: 0
                SM lid: 0
                Capability mask: 
                Port GUID: 
                Link layer: Ethernet

if I use this command

[nscc-gz@centos203 examples]$ mpirun --mca btl openib,self,vader --mca btl_openib_cpc_include rdmacm  -np 4  hello_c 
--------------------------------------------------------------------------
No OpenFabrics connection schemes reported that they were able to be
used on a specific port.  As such, the openib BTL (OpenFabrics
support) will be disabled for this port.

  Local host:           centos203
  Local device:         
  Local port:           1
  CPCs attempted:       rdmacm
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:10977] 3 more processes have sent help message help-mpi-btl-openib-cpc-base.txt / no cpcs for port
[centos203:10977] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

And if i designate the ib device

[nscc-gz@centos203 examples]$ mpirun -np 4 ./hello_c --mca btl_openib_if_exclude mlx5_0
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: centos203
--------------------------------------------------------------------------
Hello, world, I am 0 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 1 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 2 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
Hello, world, I am 3 of 4, (Open MPI v4.1.2, package: Open MPI nscc-gz@centos203 Distribution, ident: 4.1.2, repo rev: v4.1.2, Nov 24, 2021, 112)
[centos203:14896] 3 more processes have sent help message help-mpi-btl-openib.txt / no active ports found
[centos203:14896] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages
[nscc-gz@centos203 examples]$ 

The ifconfig and ib port

[nscc-gz@centos203 examples]$ ibdev2netdev
mlx5_0 port 1 ==> enp1s0f0 (Up)
mlx5_1 port 1 ==> enp1s0f1 (Down)
[nscc-gz@centos203 examples]$ ifconfig
enp125s0f0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 1951751285  bytes 2352472322729 (2.1 TiB)
        RX errors 0  dropped 11718888  overruns 0  frame 0
        TX packets 822856179  bytes 1385364963277 (1.2 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp125s0f1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.16.29.130  netmask 255.255.255.0  broadcast 172.16.29.255
        inet6   prefixlen 64  scopeid 0x20<link>
        ether   txqueuelen 1000  (Ethernet)
        RX packets 19347918  bytes 7289410117 (6.7 GiB)
        RX errors 0  dropped 2958451  overruns 0  frame 0
        TX packets 12963627  bytes 48203399135 (44.8 GiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp125s0f2: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp125s0f3: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether   txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp1s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.40.1.203  netmask 255.255.255.0  broadcast 10.40.1.255
        inet6   prefixlen 64  scopeid 0x20<link>
        ether   txqueuelen 1000  (Ethernet)
        RX packets 382158355530  bytes 544487865896139 (495.2 TiB)
        RX errors 208  dropped 3083040  overruns 0  frame 208
        TX packets 379357423669  bytes 545429402206655 (496.0 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

enp1s0f1: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
        ether  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 32048729965  bytes 809795856471103 (736.5 TiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 32048729965  bytes 809795856471103 (736.5 TiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

[nscc-gz@centos203 examples]$ 

could you tell me how can I use the IB devices correctly? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions