Skip to content

IMB-RMA failure on POWER #8102

Open
Open
@loveshack

Description

@loveshack

Thank you for taking the time to submit an issue!

Background information

What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)

4.0.5

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

spack install openmpi@4.0.5 +cuda +cxx +legacylaunchers +lustre fabrics=cma,knem,ucx schedulers=slurm

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: RHEL 7.6 ppc64le
  • Computer hardware: IBM AC922 (like Summit/Sierra)
  • Network type: EDR IB

Details of the problem

IMB-RMA crashes at the start like this. A similar build on x86_64 runs, as does spectrum-mpi 10.3 on this system.

Is this known to work on Summit?

# Truly_passive_put
#     The benchmark measures execution time of MPI_Put for 2 cases:
#     1) The target is waiting in MPI_Barrier call (t_pure value)
#     2) The target performs computation and then enters MPI_Barrier routine (t_ovrl value)
[gpu027:91080:0:91080] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x8)
==== backtrace (tid:  91080) ====
=================================
[gpu027:91080] *** Process received signal ***
[gpu027:91080] Signal: Segmentation fault (11)
[gpu027:91080] Signal code:  (-6)
[gpu027:91080] Failing at address: 0x262292ca000163c8
[gpu027:91080] [ 0] [0x2000000504d8]
[gpu027:91080] [ 1] /users/***/spack/opt/spack/linux-rhel7-power9le/gcc-8.4.0/openmpi-4.0.5-6sqv24vyrwc5nerb7y5fslqnf5jrnjv6/lib/openmpi/mca_osc_rdma.so(ompi_osc_rdma_lock_atomic+0x94)[0x200014d19fb4]
[gpu027:91080] [ 2] /users/***/spack/opt/spack/linux-rhel7-power9le/gcc-8.4.0/openmpi-4.0.5-6sqv24vyrwc5nerb7y5fslqnf5jrnjv6/lib/libmpi.so.40(MPI_Win_lock+0x138)[0x2000001835a8]
[gpu027:91080] [ 3] IMB-RMA(IMB_rma_single_put+0x17c)[0x100d3d18]
[gpu027:91080] [ 4] IMB-RMA(_ZN11Bmark_descr21IMB_init_buffers_iterEP9comm_infoP13iter_scheduleP5BenchP5cmodeii+0xce0)[0x100a9ac8]
[gpu027:91080] [ 5] IMB-RMA(_ZN17OriginalBenchmarkI14BenchmarkSuiteIL17benchmark_suite_t3EEXadL_Z18IMB_rma_single_putEEE3runERK10scope_item+0x398)[0x100ab1a0]
[gpu027:91080] [ 6] IMB-RMA(main+0x19b0)[0x10060aa4]
[gpu027:91080] [ 7] /lib64/libc.so.6(+0x25200)[0x200000645200]
[gpu027:91080] [ 8] /lib64/libc.so.6(__libc_start_main+0xc4)[0x2000006453f4]
[gpu027:91080] *** End of error message ***
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions