Skip to content

MPI I/O Crashes and Corruption in HDF5 #6285

Closed
@ax3l

Description

@ax3l

Background information

What version of Open MPI are you using?

3.1.3

Update: affects 2.X, 3.X and 4.0.0

Describe how Open MPI was installed

From source via spack or as modules build from source on various different HPC systems.

Please describe the system on which you are running

  • Operating system/version: Debian 9.6 and derivates such as Ubuntu
  • Computer hardware: Laptops and HPC
  • Network type: local and remote (Ethernet and IB)

Details of the problem

We are currently reporting two parallel HDF5 issues that either crash on write or corrupt data. The issues only occur with OpenMPI, not with MPICH 3.3 which we used for comparison.

We (@ax3l, @psychocoderHPC) are not the upstream authors of HDF5, but wanted to inform and connect you, so you are aware of these issues as they might be rooted in the OpenMPI I/O layer somewhere.

Link to HDF5 parallel I/O write issues with reproducers:

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions