Closed
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
v4.1.6 from https://www.open-mpi.org/software/ompi/v4.1/
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
tar xvf openmpi-4.1.6.tar.gz
cd openmpi-4.1.6 && mkdir build && cd build
./configure --prefix=/special/place/for/install --enable-debug --enable-debug-symbols --with-pmi
make -j 16
make install
export MPI_HOME=/special/place/for/install
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status
.
No
Please describe the system on which you are running
- Operating system/version: SLES 12.3
- Computer hardware: AMD EPYC 7351P 16-Core Processor
- Network type: InfiniBand ports
Details of the problem
I use this code
#include <mpi.h>
#include <stdio.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
int world_size;
MPI_Comm_size(MPI_COMM_WORLD, &world_size);
int world_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);
char processor_name[MPI_MAX_PROCESSOR_NAME];
int name_len;
MPI_Get_processor_name(processor_name, &name_len);
printf("Hello world from processor %s, rank %d out of %d processors\n", processor_name, world_rank, world_size);
MPI_Finalize();
return 0;
}
Compile
mpicc -g -o hello_world hello_world.c
Run
sbatch --partition=test --ntasks-per-node=1 --wrap "srun -u valgrind --track-origins=yes ./hello_world"
Output:
==31946== Memcheck, a memory error detector
==31946== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31946== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==31946== Command: ./hello_world
==31946==
==31946== Conditional jump or move depends on uninitialised value(s)
==31946== at 0xFDA5CAF: init_one_device (btl_openib_component.c:1956)
==31946== by 0xFDA7F24: btl_openib_component_init (btl_openib_component.c:2880)
==31946== by 0x5B9BCFB: mca_btl_base_select (btl_base_select.c:110)
==31946== by 0xF76F53A: mca_bml_r2_component_init (bml_r2_component.c:86)
==31946== by 0x4F227A5: mca_bml_base_init (bml_base_init.c:74)
==31946== by 0x4F8898C: ompi_mpi_init (ompi_mpi_init.c:613)
==31946== by 0x4EE88C1: PMPI_Init (pinit.c:67)
==31946== by 0x4008D2: main (hello_world.c:5)
==31946== Uninitialised value was created by a stack allocation
==31946== at 0xFDB671F: parse_file (btl_openib_ini.c:221)
==31946==
Hello world from processor host18, rank 0 out of 1 processors
==31946== Conditional jump or move depends on uninitialised value(s)
==31946== at 0xFD9D32D: mca_btl_openib_finalize_resources (btl_openib.c:1715)
==31946== by 0xFD9D4F6: mca_btl_openib_finalize (btl_openib.c:1743)
==31946== by 0x5B9B296: mca_btl_base_close (btl_base_frame.c:203)
==31946== by 0x5B7EB9F: mca_base_framework_close (mca_base_framework.c:216)
==31946== by 0x4F22926: mca_bml_base_close (bml_base_frame.c:130)
==31946== by 0x5B7EB9F: mca_base_framework_close (mca_base_framework.c:216)
==31946== by 0x4E9CA4A: ompi_mpi_finalize (ompi_mpi_finalize.c:449)
==31946== by 0x4EDB0EC: PMPI_Finalize (pfinalize.c:54)
==31946== by 0x400931: main (hello_world.c:20)
==31946== Uninitialised value was created by a stack allocation
==31946== at 0xFDB671F: parse_file (btl_openib_ini.c:221)
==31946==
==31946== Conditional jump or move depends on uninitialised value(s)
==31946== at 0x5B32AEF: opal_interval_tree_reader_get_token (opal_interval_tree.c:127)
==31946== by 0x5B34088: opal_interval_tree_traverse (opal_interval_tree.c:734)
==31946== by 0x5BFE5DB: mca_rcache_base_vma_tree_iterate (rcache_base_vma_tree.c:105)
==31946== by 0x5BFE1B9: mca_rcache_base_vma_iterate (rcache_base_vma.c:153)
==31946== by 0xF361305: mca_rcache_grdma_finalize (rcache_grdma_module.c:543)
==31946== by 0x5BFDB6F: mca_rcache_base_module_destroy (rcache_base_create.c:113)
==31946== by 0xFDA2A2C: device_destruct (btl_openib_component.c:993)
==31946== by 0xFD9735E: opal_obj_run_destructors (opal_object.h:483)
==31946== by 0xFD9D3F5: mca_btl_openib_finalize_resources (btl_openib.c:1716)
==31946== by 0xFD9D4F6: mca_btl_openib_finalize (btl_openib.c:1743)
==31946== by 0x5B9B296: mca_btl_base_close (btl_base_frame.c:203)
==31946== by 0x5B7EB9F: mca_base_framework_close (mca_base_framework.c:216)
==31946== Uninitialised value was created by a heap allocation
==31946== at 0x4C29110: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31946== by 0x5BFDDEB: opal_obj_new (opal_object.h:507)
==31946== by 0x5BFDC93: opal_obj_new_debug (opal_object.h:263)
==31946== by 0x5BFDFB3: mca_rcache_base_vma_module_alloc (rcache_base_vma.c:56)
==31946== by 0xF36011E: mca_rcache_grdma_cache_contructor (rcache_grdma_module.c:88)
==31946== by 0xF3615F6: opal_obj_run_constructors (opal_object.h:461)
==31946== by 0xF361711: opal_obj_new (opal_object.h:515)
==31946== by 0xF36156B: opal_obj_new_debug (opal_object.h:263)
==31946== by 0xF361D03: grdma_init (rcache_grdma_component.c:123)
==31946== by 0x5BFDA4C: mca_rcache_base_module_create (rcache_base_create.c:87)
==31946== by 0xFDA590E: init_one_device (btl_openib_component.c:1877)
==31946== by 0xFDA7F24: btl_openib_component_init (btl_openib_component.c:2880)
==31946==
==31946== Use of uninitialised value of size 8
==31946== at 0x5B31CBC: opal_thread_compare_exchange_strong_32 (thread_usage.h:160)
==31946== by 0x5B32B34: opal_interval_tree_reader_get_token (opal_interval_tree.c:134)
==31946== by 0x5B34088: opal_interval_tree_traverse (opal_interval_tree.c:734)
==31946== by 0x5BFE5DB: mca_rcache_base_vma_tree_iterate (rcache_base_vma_tree.c:105)
==31946== by 0x5BFE1B9: mca_rcache_base_vma_iterate (rcache_base_vma.c:153)
==31946== by 0xF361305: mca_rcache_grdma_finalize (rcache_grdma_module.c:543)
==31946== by 0x5BFDB6F: mca_rcache_base_module_destroy (rcache_base_create.c:113)
==31946== by 0xFDA2A2C: device_destruct (btl_openib_component.c:993)
==31946== by 0xFD9735E: opal_obj_run_destructors (opal_object.h:483)
==31946== by 0xFD9D3F5: mca_btl_openib_finalize_resources (btl_openib.c:1716)
==31946== by 0xFD9D4F6: mca_btl_openib_finalize (btl_openib.c:1743)
==31946== by 0x5B9B296: mca_btl_base_close (btl_base_frame.c:203)
==31946== Uninitialised value was created by a heap allocation
==31946== at 0x4C29110: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31946== by 0x5BFDDEB: opal_obj_new (opal_object.h:507)
==31946== by 0x5BFDC93: opal_obj_new_debug (opal_object.h:263)
==31946== by 0x5BFDFB3: mca_rcache_base_vma_module_alloc (rcache_base_vma.c:56)
==31946== by 0xF36011E: mca_rcache_grdma_cache_contructor (rcache_grdma_module.c:88)
==31946== by 0xF3615F6: opal_obj_run_constructors (opal_object.h:461)
==31946== by 0xF361711: opal_obj_new (opal_object.h:515)
==31946== by 0xF36156B: opal_obj_new_debug (opal_object.h:263)
==31946== by 0xF361D03: grdma_init (rcache_grdma_component.c:123)
==31946== by 0x5BFDA4C: mca_rcache_base_module_create (rcache_base_create.c:87)
==31946== by 0xFDA590E: init_one_device (btl_openib_component.c:1877)
==31946== by 0xFDA7F24: btl_openib_component_init (btl_openib_component.c:2880)
==31946==
==31946== Use of uninitialised value of size 8
==31946== at 0x5B31CCF: opal_thread_compare_exchange_strong_32 (thread_usage.h:160)
==31946== by 0x5B32B34: opal_interval_tree_reader_get_token (opal_interval_tree.c:134)
==31946== by 0x5B34088: opal_interval_tree_traverse (opal_interval_tree.c:734)
==31946== by 0x5BFE5DB: mca_rcache_base_vma_tree_iterate (rcache_base_vma_tree.c:105)
==31946== by 0x5BFE1B9: mca_rcache_base_vma_iterate (rcache_base_vma.c:153)
==31946== by 0xF361305: mca_rcache_grdma_finalize (rcache_grdma_module.c:543)
==31946== by 0x5BFDB6F: mca_rcache_base_module_destroy (rcache_base_create.c:113)
==31946== by 0xFDA2A2C: device_destruct (btl_openib_component.c:993)
==31946== by 0xFD9735E: opal_obj_run_destructors (opal_object.h:483)
==31946== by 0xFD9D3F5: mca_btl_openib_finalize_resources (btl_openib.c:1716)
==31946== by 0xFD9D4F6: mca_btl_openib_finalize (btl_openib.c:1743)
==31946== by 0x5B9B296: mca_btl_base_close (btl_base_frame.c:203)
==31946== Uninitialised value was created by a heap allocation
==31946== at 0x4C29110: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31946== by 0x5BFDDEB: opal_obj_new (opal_object.h:507)
==31946== by 0x5BFDC93: opal_obj_new_debug (opal_object.h:263)
==31946== by 0x5BFDFB3: mca_rcache_base_vma_module_alloc (rcache_base_vma.c:56)
==31946== by 0xF36011E: mca_rcache_grdma_cache_contructor (rcache_grdma_module.c:88)
==31946== by 0xF3615F6: opal_obj_run_constructors (opal_object.h:461)
==31946== by 0xF361711: opal_obj_new (opal_object.h:515)
==31946== by 0xF36156B: opal_obj_new_debug (opal_object.h:263)
==31946== by 0xF361D03: grdma_init (rcache_grdma_component.c:123)
==31946== by 0x5BFDA4C: mca_rcache_base_module_create (rcache_base_create.c:87)
==31946== by 0xFDA590E: init_one_device (btl_openib_component.c:1877)
==31946== by 0xFDA7F24: btl_openib_component_init (btl_openib_component.c:2880)
==31946==
==31946== Use of uninitialised value of size 8
==31946== at 0x5B32B5D: opal_interval_tree_reader_return_token (opal_interval_tree.c:142)
==31946== by 0x5B340D7: opal_interval_tree_traverse (opal_interval_tree.c:736)
==31946== by 0x5BFE5DB: mca_rcache_base_vma_tree_iterate (rcache_base_vma_tree.c:105)
==31946== by 0x5BFE1B9: mca_rcache_base_vma_iterate (rcache_base_vma.c:153)
==31946== by 0xF361305: mca_rcache_grdma_finalize (rcache_grdma_module.c:543)
==31946== by 0x5BFDB6F: mca_rcache_base_module_destroy (rcache_base_create.c:113)
==31946== by 0xFDA2A2C: device_destruct (btl_openib_component.c:993)
==31946== by 0xFD9735E: opal_obj_run_destructors (opal_object.h:483)
==31946== by 0xFD9D3F5: mca_btl_openib_finalize_resources (btl_openib.c:1716)
==31946== by 0xFD9D4F6: mca_btl_openib_finalize (btl_openib.c:1743)
==31946== by 0x5B9B296: mca_btl_base_close (btl_base_frame.c:203)
==31946== by 0x5B7EB9F: mca_base_framework_close (mca_base_framework.c:216)
==31946== Uninitialised value was created by a heap allocation
==31946== at 0x4C29110: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==31946== by 0x5BFDDEB: opal_obj_new (opal_object.h:507)
==31946== by 0x5BFDC93: opal_obj_new_debug (opal_object.h:263)
==31946== by 0x5BFDFB3: mca_rcache_base_vma_module_alloc (rcache_base_vma.c:56)
==31946== by 0xF36011E: mca_rcache_grdma_cache_contructor (rcache_grdma_module.c:88)
==31946== by 0xF3615F6: opal_obj_run_constructors (opal_object.h:461)
==31946== by 0xF361711: opal_obj_new (opal_object.h:515)
==31946== by 0xF36156B: opal_obj_new_debug (opal_object.h:263)
==31946== by 0xF361D03: grdma_init (rcache_grdma_component.c:123)
==31946== by 0x5BFDA4C: mca_rcache_base_module_create (rcache_base_create.c:87)
==31946== by 0xFDA590E: init_one_device (btl_openib_component.c:1877)
==31946== by 0xFDA7F24: btl_openib_component_init (btl_openib_component.c:2880)
==31946==
==31946==
==31946== HEAP SUMMARY:
==31946== in use at exit: 568,113 bytes in 6,896 blocks
==31946== total heap usage: 46,678 allocs, 39,782 frees, 846,341,962 bytes allocated
==31946==
==31946== LEAK SUMMARY:
==31946== definitely lost: 28,700 bytes in 66 blocks
==31946== indirectly lost: 10,455 bytes in 23 blocks
==31946== possibly lost: 1,768 bytes in 2 blocks
==31946== still reachable: 527,190 bytes in 6,805 blocks
==31946== suppressed: 0 bytes in 0 blocks
==31946== Rerun with --leak-check=full to see details of leaked memory
==31946==
==31946== For counts of detected and suppressed errors, rerun with: -v
==31946== ERROR SUMMARY: 6 errors from 6 contexts (suppressed: 0 from 0)