Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with intra-node communication #6518

Closed
mredenti opened this issue Jan 6, 2025 · 1 comment
Closed

Problem with intra-node communication #6518

mredenti opened this issue Jan 6, 2025 · 1 comment

Comments

@mredenti
Copy link

mredenti commented Jan 6, 2025

Version of Singularity:

$ singularity --version
SingularityPRO version 3.11-5.el8

Actual behavior

When running

#!/bin/bash
#SBATCH --ntasks=8
#SBATCH --ntasks-per-node=4
#SBATCH --nodes=2

module load openmpi/4.1.6--nvhpc--24.3
mpirun -np 8 singularity exec fall3d_opeacc.sif Fall3d.x 

I get the following error

[lrdn2911:1723502:0:1723502]      cma_ep.c:88   process_vm_readv(pid=1723503 {0x14745e5ac800,61928}-->{0x150dba573e00,61928}) returned -1: Bad address
[lrdn2912:2435814:0:2435814]      cma_ep.c:88   process_vm_readv(pid=2435813 {0x14d1065ac800,61928}-->{0x15453a573e00,61928}) returned -1: Bad address
[lrdn2911:1723498:0:1723498]      cma_ep.c:88   process_vm_readv(pid=1723500 {0x1505545ac800,61928}-->{0x1490a2573e00,61928}) returned -1: Bad address
[lrdn2912:2435816:0:2435816]      cma_ep.c:88   process_vm_readv(pid=2435815 {0x149e885ac800,61928}-->{0x154f4a573e00,61928}) returned -1: Bad address
==== backtrace (tid:1723498) ====
 0 0x0000000000003803 uct_cma_ep_tx_error()  /build-result/src/hpcx-v2.20-gcc-inbox-redhat8-cuda12-x86_64/ucx-39c8f9b/src/uct/sm/scopy/cma/cma_ep.c:85
...

CMA (Cross-Memory Attach) is enabled inside UCX/Open MPI but fails to on the process_vm_readv()/process_vm_writev() system calls to do zero-copy shared memory transfers between processes on the same node.

Copy link

github-actions bot commented Jan 6, 2025

New issues are no longer accepted in this repository. If singularity --version says singularity-ce, submit instead to https://github.com/sylabs/singularity, otherwise submit to https://github.com/apptainer/apptainer.

@github-actions github-actions bot closed this as completed Jan 6, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant