Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5: Document HDF5_USE_FILE_LOCKING #1106

Merged
merged 1 commit into from
Sep 22, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
HDF5: Document HDF5_USE_FILE_LOCKING
Document a HDF5 read work-around that we currently need on OLCF
Jupyter (https://jupyter.olcf.ornl.gov), due to a mounting issue
of GPFS in the Jupyter serice (OLCFHELP-3685).

From the HDF5 1.10.1 Release Notes:
```
Other New Features and Enhancements
===================================

    Library
    -------
    - Added a mechanism for disabling the SWMR file locking scheme.

      The file locking calls used in HDF5 1.10.0 (including patch1)
      will fail when the underlying file system does not support file
      locking or where locks have been disabled. To disable all file
      locking operations, an environment variable named
      HDF5_USE_FILE_LOCKING can be set to the five-character string
      'FALSE'. This does not fundamentally change HDF5 library
      operation (aside from initial file open/create, SWMR is lock-free),
      but users will have to be more careful about opening files
      to avoid problematic access patterns (i.e.: multiple writers)
      that the file locking was designed to prevent.

      Additionally, the error message that is emitted when file lock
      operations set errno to ENOSYS (typical when file locking has been
      disabled) has been updated to describe the problem and potential
      resolution better.

      (DER, 2016/10/26, HDFFV-9918)
```

This also exists as a compilation option for HDF5 in CMake, where it
defaults to ``TRUE`` by default, which is also what distributions/
package managers ship.

Disabling from Bash:
```bash
export HDF5_USE_FILE_LOCKING=FALSE
```

Disabling from Python:
```py
import os
os.environ['HDF5_USE_FILE_LOCKING'] = "FALSE"
```
  • Loading branch information
ax3l committed Sep 16, 2021
commit d5a14ae55dca561d2d5b7b76221708f13c16798a
18 changes: 13 additions & 5 deletions docs/source/backends/hdf5.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,14 +21,15 @@ Backend-Specific Controls

The following environment variables control HDF5 I/O behavior at runtime.

===================================== ========= ====================================================================================
environment variable default description
===================================== ========= ====================================================================================
===================================== ========= ===========================================================================================================
Environment variable Default Description
===================================== ========= ===========================================================================================================
``OPENPMD_HDF5_INDEPENDENT`` ``ON`` Sets the MPI-parallel transfer mode to collective (``OFF``) or independent (``ON``).
``OPENPMD_HDF5_ALIGNMENT`` ``1`` Tuning parameter for parallel I/O, choose an alignment which is a multiple of the disk block size.
``OPENPMD_HDF5_CHUNKS`` ``auto`` Defaults for ``H5Pset_chunk``: ``"auto"`` (heuristic) or ``"none"`` (no chunking).
``H5_COLL_API_SANITY_CHECK`` unset Set to ``1`` to perform an ``MPI_Barrier`` inside each meta-data operation.
===================================== ========= ====================================================================================
``H5_COLL_API_SANITY_CHECK`` unset Debug: Set to ``1`` to perform an ``MPI_Barrier`` inside each meta-data operation.
``HDF5_USE_FILE_LOCKING`` ``TRUE`` Work-around: Set to ``FALSE`` in case you are on an HPC or network file system that hang in open for reads.
franzpoeschel marked this conversation as resolved.
Show resolved Hide resolved
===================================== ========= ===========================================================================================================

``OPENPMD_HDF5_INDEPENDENT``: by default, we implement MPI-parallel data ``storeChunk`` (write) and ``loadChunk`` (read) calls as `none-collective MPI operations <https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node87.htm#Node87>`_.
Attribute writes are always collective in parallel HDF5.
Expand All @@ -48,6 +49,13 @@ Chunking generally improves performance and only needs to be disabled in corner-
Debugging a parallel program with that option enabled can help to spot bugs such as collective MPI-calls that are not called by all participating MPI ranks.
Do not use in production, this will slow parallel I/O operations down.

``HDF5_USE_FILE_LOCKING``: this is a HDF5 1.10.1+ control option that disables HDF5 internal file locking operations (see `HDF5 1.10.1 release notes <https://support.hdfgroup.org/ftp/HDF5/releases/ReleaseFiles/hdf5-1.10.1-RELEASE.txt>`__).
This mechanism is mainly used to ensure that a file that is still being written to cannot (yet) be opened by either a reader or another writer.
On some HPC and Jupyter systems, parallel/network file systems like GPFS are mounted in a way that interferes with this internal, HDF5 access consistency check.
As a result, read-only operations like ``h5ls some_file.h5`` or openPMD ``Series`` open can hang indefinitely.
If you are sure that the file was written completely and is closed by the writer, e.g., because a simulation finished that created HDF5 outputs, then you can set this environment variable to ``FALSE`` to work-around the problem.
You should also report this problem to your system support, so they can fix the file system mount options or disable locking by default in the provided HDF5 installation.


Selected References
-------------------
Expand Down