-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OpenMPI: Update to 3.0.5+|3.1.5+|4.0.1+ or Use ROMIO for IO #446
Labels
affects latest release
backend: ADIOS1
backend: HDF5
bug
third party
third party libraries that are shipped and/or linked
Comments
ax3l
added
bug
affects latest release
third party
third party libraries that are shipped and/or linked
backend: HDF5
backend: ADIOS1
labels
Jan 21, 2019
ax3l
changed the title
OpenMPI: Use ROMIO for IO
OpenMPI: Update to Latest or Use ROMIO for IO
Jul 25, 2019
ax3l
changed the title
OpenMPI: Update to Latest or Use ROMIO for IO
OpenMPI: Update to 4.0.1 or Use ROMIO for IO
Sep 27, 2019
ax3l
changed the title
OpenMPI: Update to 4.0.1 or Use ROMIO for IO
OpenMPI: Update to 3.0.4+|3.1.4+|4.0.1+ or Use ROMIO for IO
Sep 27, 2019
ax3l
changed the title
OpenMPI: Update to 3.0.4+|3.1.4+|4.0.1+ or Use ROMIO for IO
OpenMPI: Update to 3.0.5+|3.1.5+|4.0.1+ or Use ROMIO for IO
Nov 26, 2019
This was referenced Jan 25, 2020
ax3l
added a commit
to ax3l/openPMD-api
that referenced
this issue
Sep 22, 2021
Document OpenMPI MPI-I/O backend control. We have documented this long in openPMD#446.
franzpoeschel
pushed a commit
that referenced
this issue
Sep 24, 2021
Document OpenMPI MPI-I/O backend control. We have documented this long in #446.
ax3l
added a commit
to ax3l/openPMD-api
that referenced
this issue
Nov 3, 2021
Document OpenMPI MPI-I/O backend control. We have documented this long in openPMD#446.
ax3l
added a commit
that referenced
this issue
Nov 4, 2021
* Read: time/dt also in long double (#1096) * Python: time/dt round-trip Test writing and reading time and dt on an iteration via properties. * Fix: Iteration read of long double time Support reading of `dt` and `time` attributes if they are of type `long double`. (openPMD standard: all `floatX` supported) * Executables: CXX_STANDARD/EXTENSIONS (#1102) Set `CXX_EXTENSIONS OFF` and `CXX_STANDARD_REQUIRED ON` for created executables. This mitigates issues with NVCC 11.0 and C++17 builds seen as added `-std=gnu++17` flags that lead to ``` nvcc fatal : Value 'gnu++17' is not defined for option 'std' ``` when using `nvcc` as CXX compiler directly. * Doc: More Locations -DPython_EXECUTABLE (#1104) Mention the `-DPython_EXECUTABLE` twice more in build examples. * NVCC + C++17 (#1103) * NVCC + C++17 Work-around a build issue with NVCC in C++17 builds. ``` include/openPMD/backend/Attributable.hpp(437): error #289: no instance of constructor "openPMD::Attribute::Attribute" matches the argument list argument types are: (std::__cxx11::string) detected during instantiation of "__nv_bool openPMD::AttributableInterface::setAttribute(const std::__cxx11::string &, T) [with T=std::__cxx11::string]" ``` from ``` inline bool AttributableInterface::setAttribute( std::string const & key, char const value[] ) { return this->setAttribute(key, std::string(value)); } ``` Seen with: - NVCC 11.0.2 + GCC 8.3.0 - NVCC 11.0.2 + GCC 7.5.0 * NVCC 11.0.2 C++17 work-around: Add Comment * Lazy parsing: Make findable in docs and use in openpmd-ls (#1111) * Use deferred iteration parsing in openpmd-ls * Make lazy/deferred parsing searchable * Add a way to search for usesteps key * HDF5: Document HDF5_USE_FILE_LOCKING (#1106) Document a HDF5 read work-around that we currently need on OLCF Jupyter (https://jupyter.olcf.ornl.gov), due to a mounting issue of GPFS in the Jupyter serice (OLCFHELP-3685). From the HDF5 1.10.1 Release Notes: ``` Other New Features and Enhancements =================================== Library ------- - Added a mechanism for disabling the SWMR file locking scheme. The file locking calls used in HDF5 1.10.0 (including patch1) will fail when the underlying file system does not support file locking or where locks have been disabled. To disable all file locking operations, an environment variable named HDF5_USE_FILE_LOCKING can be set to the five-character string 'FALSE'. This does not fundamentally change HDF5 library operation (aside from initial file open/create, SWMR is lock-free), but users will have to be more careful about opening files to avoid problematic access patterns (i.e.: multiple writers) that the file locking was designed to prevent. Additionally, the error message that is emitted when file lock operations set errno to ENOSYS (typical when file locking has been disabled) has been updated to describe the problem and potential resolution better. (DER, 2016/10/26, HDFFV-9918) ``` This also exists as a compilation option for HDF5 in CMake, where it defaults to ``TRUE`` by default, which is also what distributions/ package managers ship. Disabling from Bash: ```bash export HDF5_USE_FILE_LOCKING=FALSE ``` Disabling from Python: ```py import os os.environ['HDF5_USE_FILE_LOCKING'] = "FALSE" ``` * Avoid object slicing when deriving from Series class (#1107) * Make Series class final * Use private constructor to avoid object slicing * Doc: OMPI_MCA_io Control (#1114) Document OpenMPI MPI-I/O backend control. We have documented this long in #446. * openPMD.hpp: Include auxiliary StringManip (#1124) Include this, handy functions. * CXX Std: Remember <variant> Impl. (#1128) We use `<variant>` or `<mpark/variant.hpp>` in our public API interface for datatypes, depending on the C++ standard. This pull request makes sure that the same implementation is used in downstream code, even if the C++ standard is switched. This avoids ABI issues when, e.g., using a C++14 built openPMD-api in a C++17 downstream code. * Spack: No More `load -r` (#1125) The `-r` argument was removed from `spack load` and is now implied. * Fix AppVeyor: Python Executable (#1127) * GH Action: Add MSVC & ClangCL on Win * Fix AppVeyor: Python Executable * Avoid mismatching system Python and Conda Python * Conda: Fix Numpy * CMake: Skip Pipe Test Written in a too special way, we cannot assume SH is always present * Test 8b (Bench Read Parallel): Support Variable encoding, Fix Bugs (#1131) * added support to read variable encoding, plus fixed some bugs * fixed style * Update examples/8b_benchmark_read_parallel.cpp remove commented out code Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Update examples/8b_benchmark_read_parallel.cpp Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * removed commented line * updated 8b env option Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * HDF5 I/O optimizations (#1129) * Include HDF5 optimization options * Fix code style check * Fix validations and include checks * Fix style check * Remove unecessary strict check * Update documentation with HDF5 tuning options * Update contributions * Fix Guards for H5Pset_all_coll_metadata* * MPI Guard: H5Pset_all_coll_metadata* * Remove duplicated variable Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Include known issues section for HDF5 (#1132) * Update known issues with HDF5 and collective metadata operations * Fix rst link and tiny typo * Add targeted bugfix releases. Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja> * Include check for paged allocation (#1133) * Include check for paged allocation * Update ParallelHDF5IOHandler.cpp * libfabric 1.6+: Document SST Work-Arounds (#1134) * libfabric 1.6+: Document SST Work-Arounds Document work-arounds for libfabric 1.6+ on Cray systems when using data staging / streaming with ADIOS2 SST. Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com> * Fix: Read Inconsistent Zero Pads (#1118) * [Draft] Fix: Read Inconsistent Zero Pads Some codes mess up the zero-padding in `fileBased` encoding, e.g., when specifying padding to 5 digits but creating >100'000 output steps. Files like those cannot yet be parsed and fell back to no padding, which fails to open the file: ``` openpmd_00000.h5 openpmd_02000.h5 openpmd_101000.h5 openpmd_01000.h5 openpmd_100000.h5 openpmd_104000.h5 ``` Error: ``` RuntimeError: [HDF5] Failed to open HDF5 file diags/diag1/openpmd_0.h5 ``` * Revert previous changes except for test Parse iteration numbers that are longer than their padding Read inconsistent zero padding * Overflow Padding: Read Test * Warn if the prefix does end in a digit * Fix: Don't let oversize numbers accidentally bump the padding * Update test * Issue warnings on misleading patterns also when writing * Minor Style Update Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com> * Release: 0.14.3 Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com> Co-authored-by: guj <guj@users.noreply.github.com> Co-authored-by: Jean Luca Bez <jeanlucabez@gmail.com> Co-authored-by: Jean Luca Bez <jlbez@lbl.gov>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects latest release
backend: ADIOS1
backend: HDF5
bug
third party
third party libraries that are shipped and/or linked
A note on using openPMD-api, especially the parallel HDF5 backend, with OpenMPI:
OpenMPI's default for its IO backend is OMPIO, starting with 2.x.
The issues below are fixed in OpenMPI versions:
: affected, not fixed (end-of-life)v2.0
: affected, not fixed (end-of-life)v2.x
v3.0.4
or newerv3.1.4
or newerv4.0.1
or newerUnfortunately, that backend contains severe bugs leading to data corruption and sporadic crashes as of the latest releases (affected: 2.X to 3.1.3 and 4.0.0). We saw those issues so far with parallel HDF5, but since other MPI-IO-parallel methods such as ADIOS use the same MPI
-IOAPI they are potentially affected as well. Please see open-mpi/ompi#6285 for details.As a work-around for all systems that rely on OpenMPI (and its derivatives, such as BullMPI), disable the "OMPIO" default IO backend and fallback to the existing ROMIO backend for MPI-I/O until fixed versions are available.
Available runtime switches:
export OMPI_MCA_io=^ompio mirun ...
or
Other MPI implementations such as MPICH, and MPICH-based flavors such as IntelMPI, use ROMIO by default (they develop ROMIO) and are not affected.
The text was updated successfully, but these errors were encountered: