-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HYPRE Struct - problems using GPU-aware MPI #1131
Comments
Hello, I have similar issues with Hypre (CG+Boomeramg, used through PETSc) with MPI Gpu-Aware. OpenMPI 4.x (no GPU Aware) -> OK for all my tests The KSP_DIVERGED happens with Hypre Boomeramg above some number of GPUs and with CG solver. The issue may be bypassed by switching to BiCGstab solver... Could you check your OpenMPI version and test with 5.0.5 ? Thanks |
Hi and thanks for your feedback! I've been using OpenMPI 4.1.6 so I'll try some >5 version and let you know the result. For me, the problem is not solver-dependent and occurs at the first assemble of the matrix. Cheers, |
Hi again, I've tested with OpenMPI 5.0.5 but I am unfortunately getting the same segfault: [acn35:1588861:0:1588861] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x153b88e00004) Any other ideas are welcome... Cheers, |
Dear HYPRE developers, I would appreciate some additional feedback. I have been trying to adapt one of the example codes 'ex3.c' to reproduce the error occurring on my cluster. The modified source code can be found here: https://github.com/ondrejchrenko/HYPRE_ex3 Could you please let me know:
Cheers, |
Dear HYPRE developers,
following on issue #1126, I've been able to implement HYPRE in my code and run it on multiple GPUs. However, when I try to enable GPU-aware MPI in HYPRE, I get the following types of segmentation faults when running the code:
[acn16:283118:0:283118] Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0x15450e000004)
==== backtrace (tid: 283118) ====
0 0x0000000000012d20 __funlockfile() :0
1 0x00000000009a6891 hypre_FinalizeCommunication() /scratch/project/open-29-3/hypre-master_paragpu2/src/struct_mv/struct_communication.c:1216
2 0x00000000009b37de hypre_StructMatrixAssemble() /scratch/project/open-29-3/hypre-master_paragpu2/src/struct_mv/struct_matrix.c:1436
3 0x00000000009968c6 HYPRE_StructMatrixAssemble() /scratch/project/open-29-3/hypre-master_paragpu2/src/struct_mv/HYPRE_struct_matrix.c:323
50e000004)
When HYPRE is not used, my code runs with GPU-aware MPI without problems. Any ideas what could be causing these errors?
Thank you,
Ondrej
The text was updated successfully, but these errors were encountered: