Skip to content

Corrupted data using parallel hdf5 #12718

Closed
@tpadioleau

Description

@tpadioleau

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

v5.0.3

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Using spack 0.22.1

Please describe the system on which you are running

  • Operating system/version: Ubuntu 20.04
  • Computer hardware: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz

Details of the problem

I am using parallel hdf5 to write a 2D distributed array. If I pass a cartesian communicator to hdf5, I sometimes notice that the dataset in the hdf5 file is corrupted when using 3 processes. You can find attached (hdf5_reproducer.tar.gz) a small reproducer in C (< 100 LOC) with a hdf5 file I got running the reproducer. You will also find the result of the ompi_info command.

Without understanding the logic behind, I also noticed different situations where I seem to never get corrupted data:

  • requiring MPI_THREAD_MULTIPLE during MPI initialization,
  • passing a non-cartesian communicator,
  • using an other MPI implementation like MPICH.

Thank you,
Thomas

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions