Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Parallel HDF5: 4MB Alignment & Buffer #898

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

ax3l
Copy link
Member

@ax3l ax3l commented Jan 13, 2021

FS blocksize:

stat -fc %s .

Tried those options on Cori (Scratch and CFS): 8_benchmark case with -w, KNL partition, WarpX-like MPI-rank placement.
modules: ... darshan/3.1.7 gcc/8.3.0 cray-mpich/7.7.10 cray-hdf5-parallel/1.10.5.2 ...

Scratch: 1MB recommended blocksize (confusingly, stat -fc %s <dir> reports 4KiB)
CFS: 16 MB blocksize (with 4MiB subblocks)

Support quote:

blocksize is a quirky parameter for parallel file systems because between your compute node and the actual block devices are a bunch of network and RAID layers that have their own magic sizes. Some arcane knowledge is required

Sets medium striping.
Note: for proper ADIOS2 timings, keep the small default striping (it creates subfiles that should not be heavily striped); for proper HDF5 timings, enable striping (single output file that should be heavily striped).

For HDF5, we can also try T3PIO MPI_Info hints again.

cori.sbatch.txt

Cori: Darshan Logs

# MPICH statistics collection
export MPICH_MPIIO_STATS=1
export MPICH_MPIIO_HINTS_DISPLAY=1
export MPICH_MPIIO_TIMERS=1

# Darshan extended trace (dxt) logs
export DARSHAN_DISABLE_SHARED_REDUCTION=1
export DXT_ENABLE_IO_TRACE=4

# work-around needed
export LD_PRELOAD=/global/common/cori_cle7/software/darshan/3.1.7/lib/libdarshan.so

// srun

# disable work-around
unset LD_PRELOAD


auto const strByte = auxiliary::getEnvString( "OPENPMD_HDF5_ALIGNMENT", "1" );
auto const strByte = auxiliary::getEnvString( "OPENPMD_HDF5_ALIGNMENT", "4194304" );
Copy link
Member Author

@ax3l ax3l Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sample & bin directory du -hs with:

OPENPMD_HDF5_ALIGNMENT size
1 7.8M + 1.4G
4194304 7.8M + 1.4G

on my laptop (4KiB blocksize).

ls output (less reliable) also ok for parallel files (not padded to multiples of 4MiB). So either this is cleverly compacted or has no influence...

double policy = 0.0;
status = H5Pget_cache(m_fileAccessProperty, &metaCacheElements, &rawCacheElements, &rawCacheSize, &policy);
VERIFY(status >= 0, "[HDF5] Internal error: Failed to set H5Pget_cache");
rawCacheSize = bytes * 4; // default: 1 MiB per dataset
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be guarded so we don't accidentally set 1Byte cache if a user provides 1

@ax3l ax3l mentioned this pull request Jan 29, 2021
5 tasks
@ax3l
Copy link
Member Author

ax3l commented Jun 11, 2021

We can run these tests again after #916 was merged, maybe we see some improvement when setting striping with chunked data sets

FS blocksize:
```
stat -fc %s .
```
@ax3l

This comment has been minimized.

@ax3l ax3l force-pushed the topic-4MBalignmentAndBuf branch 2 times, most recently from e7c4377 to 9163165 Compare June 24, 2021 06:35
@@ -25,7 +25,7 @@ The following environment variables control HDF5 I/O behavior at runtime.
environment variable default description
===================================== ========= ====================================================================================
``OPENPMD_HDF5_INDEPENDENT`` ``ON`` Sets the MPI-parallel transfer mode to collective (``OFF``) or independent (``ON``).
``OPENPMD_HDF5_ALIGNMENT`` ``1`` Tuning parameter for parallel I/O, choose an alignment which is a multiple of the disk block size.
``OPENPMD_HDF5_ALIGNMENT`` ``ABC`` Tuning parameter for parallel I/O, choose an alignment which is a multiple of the disk block size.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo with final value

@ax3l ax3l force-pushed the topic-4MBalignmentAndBuf branch from 9163165 to a34d531 Compare June 24, 2021 06:51
@ax3l ax3l force-pushed the topic-4MBalignmentAndBuf branch from a34d531 to 617c465 Compare June 24, 2021 07:05
@ax3l
Copy link
Member Author

ax3l commented Jun 24, 2021

Next measurements we should try on Cori (Suren Byna):

Option 1

  • Set the alignment to 8 MB (in the H5Pset_alignment() call, threshold of 0 and alignment of 8MB)
  • Set striping on the directory where the data is being written.
  • Stripe count: 40
  • Stripe size: 8 MB

Just in case, here’s the command to set the stripe on a directory.

lfs setstripe --stripe_count 40 --stripe_size 8m ./benchmarks

Option 2

  • Alignment of 16 MB
  • Stripe count : 40
  • Stripe size: 16 MB

Jan/Feb tests

We tried various sizes in Jan/February with the job script linked above in the PR description. We saw no improvement on Cori at the time.

Since then, we implementing chunking #406 and changed the benchmark from then from 4D to 3D: #1010
Also, we have new parallel benchmarks now (8a, 8b).

Comment on lines +116 to +118
// align all (no threshold) if only alignment is set
if( m_alignment > 1 && m_threshold == 1 )
m_threshold = 0;
Copy link
Member Author

@ax3l ax3l Jun 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be moved behind the whole if( config.contains( "hdf5" ) ) block, otherwise the env var OPENPMD_HDF5_ALIGNMENT will not imply m_threshold = 0 for values >1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant