-
Notifications
You must be signed in to change notification settings - Fork 29
Include DLIO benchmark #124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
arcturus5340
wants to merge
39
commits into
hpc-io:master
Choose a base branch
from
arcturus5340:master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
1e02fc8
Add h5bench_dlio extension
arcturus5340 89c62c2
Add multithreading support
arcturus5340 8bfec7c
Committing clang-format changes
github-actions[bot] 5b702e1
Add Subfiling VFD support
arcturus5340 b2d829b
Committing clang-format changes
github-actions[bot] 4cf969c
Update the method of performance calculation
arcturus5340 40f127a
Add process logging and improve the display of benchmark results
arcturus5340 c42040a
Update the documentation
arcturus5340 e3c23f7
Committing clang-format changes
github-actions[bot] 55a8b55
Correct where the csv file is created
arcturus5340 c2ef677
Merge remote-tracking branch 'origin/master'
arcturus5340 9d80ffc
Committing clang-format changes
github-actions[bot] 923af94
Attempt number 1 to fix CI/CD
arcturus5340 78f4e9a
Fix CI/CD (Attempt number 2)
arcturus5340 6394c0c
Fix CI/CD (Attempt number 3)
arcturus5340 6c9124b
Fix CI/CD (Attempt number 4)
arcturus5340 3b2675b
Delete the Async VOL references in the code
arcturus5340 15e697b
Fix CI/CD (Attempt number 5)
arcturus5340 11c7930
Fix CI/CD (Attempt number 6)
arcturus5340 3a2e7d7
Fix CI/CD (Attempt number 7)
arcturus5340 8fc0dd2
Fix CI/CD (Attempt number 8)
arcturus5340 cb3994a
Add the readthedocs documentation page
arcturus5340 92c3c57
Update the way of counting the number of batches per rank
arcturus5340 96facfc
Add the ability to output information about all ranks
arcturus5340 61f9a80
Committing clang-format changes
github-actions[bot] 6d48da1
Update the documentation
arcturus5340 63af2cd
Add total-training-steps parameter
arcturus5340 ac6fd8f
Committing clang-format changes
github-actions[bot] f1d67d4
Merge branch 'master' into master
jeanbez c061bcf
Fix multiple issues identified in code review
arcturus5340 c9fbed9
Update the sample configuration file to reduce the runtime of the git…
arcturus5340 0b1b0cc
Update the sample configuration by adding a section with data generation
arcturus5340 2cf3cd3
Update h5bench-hdf5-1.10.8.yml
jeanbez c4b9534
Update h5bench-hdf5-1.10.4.yml
jeanbez 4e7c11a
Update h5bench-hdf5-1.10.7.yml
jeanbez 9fd3ae5
Update h5bench-hdf5-1.10.8.yml
jeanbez 56c6d0f
Update the readthedocs page
arcturus5340 33281ad
Update the README file
arcturus5340 a69307c
Update the documentation
arcturus5340 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,129 @@ | ||
# DLIO benchmark | ||
|
||
The benchmark is designed to measure the performance of training and evaluation of deep learning models on data stored | ||
as HDF5 files. Based on collected and analysed I/O patterns from [DLIO Benchmark](https://github.com/argonne-lcf/dlio_benchmark), | ||
this benchmark simulates the learning process and evaluation of deep learning models that use PyTorch and Tensorflow | ||
frameworks, while gathering valuable information about system performance. Most importantly, this extension allows users | ||
to test AI workloads without the need to install machine learning libraries, reducing complexity and enhancing the | ||
usability of the benchmark. Another advantage is that from our experiments, our extension ran faster than DLIO Benchmark, | ||
which we suspect was due to the difference in the overhead introduced by the C application in our extension and the | ||
Python application in the original benchmark. While the quicker runtime could be beneficial for faster testing, it also | ||
suggests that the benchmark might not fully capture the complexity of real AI workloads, such as high metadata | ||
operations introduced by the use of Python-based libraries. I/O pattern produced by this extension is based on the | ||
implementation of [DLIO benchmark version 1.1](https://github.com/argonne-lcf/dlio_benchmark/releases/tag/v1.1). | ||
Changes in the main DLIO Benchmark configurations after version 1.1 will not be reflected in this h5bench pattern. To | ||
reproduce them, DLIO Benchmark behavior can be studied using various I/O analysis tools. We recommend using | ||
[Log VFD](https://docs.hdfgroup.org/hdf5/v1_14/group___f_a_p_l.html#ga4e03be2fe83ed02b32266a6c81427beb). | ||
|
||
|
||
## Configuration | ||
As in the case with other extensions, the following parameters should be specified in the configuration section of the | ||
json file to configure the benchmark: | ||
|
||
| Parameter | Description | Type | Default | | ||
|------------------------|---------------------------------------------------------------------|--------|----------| | ||
| generate-data | Enable generation of benchmarking data | bool | false | | ||
| train | Enable model training simulation | bool | false | | ||
| evaluation | Enable model evaluation simulation | bool | false | | ||
| record-length | Record size of a single sample in bytes | int | 67108864 | | ||
| num-files-train | The number of files used to train the model | int | 32 | | ||
| num-files-eval | The number of files used to evaluate the model | int | 8 | | ||
| num-samples-per-file | The number of samples in each file | int | 4 | | ||
| data-folder | Name of the directory storing the benchmark data | string | ./data | | ||
| file-prefix | Prefix in the name of files containing training and evaluation data | string | img | | ||
| chunking | Enable chunking | bool | false | | ||
| chunk-size | Chunk size | int | 1024 | | ||
| keep-files | Does not delete data after the benchmark is finished | bool | false | | ||
| compression | Enable compression | bool | false | | ||
| compression-level | Compression level from 1 to 9 | int | 4 | | ||
| batch-size | Training batch size | int | 7 | | ||
| batch-size-eval | Evaluation batch size | int | 2 | | ||
| shuffle | Enable samples shuffle | bool | false | | ||
| preprocess-time | Preprocessing time after reading each sample in seconds | float | 0.0 | | ||
| preprocess-time-stdev | Standard deviation in preprocessing time in seconds | float | 0.0 | | ||
| epochs | The number of epochs | int | 5 | | ||
| total-training-steps | Maximum number of steps per training per epoch | int | -1 | | ||
| computation-time | Computation time after reading each batch in seconds | float | 0.323 | | ||
| computation-time-stdev | Standard deviation in computation time in seconds | float | 0.0 | | ||
| random-seed | Random seed to be used | int | 42 | | ||
| eval-time | Evaluation time after reading each batch in seconds | float | 0.323 | | ||
| eval-time-stdev | Standard deviation in evaluation time in seconds | float | 0.0 | | ||
| epochs-between-evals | The number of epochs between evaluations | int | 1 | | ||
| train-data-folder | Name of the directory containing the training data | string | train | | ||
| valid-data-folder | Name of the directory containing the validation data | string | valid | | ||
| records-dataset-name | Name of the dataset with records | string | records | | ||
| labels-dataset-name | Name of the dataset with labels | string | labels | | ||
| seed-change-epoch | Enable seed changes every epoch | bool | false | | ||
| read-threads | The number of workers used to read the data | int | 4 | | ||
| collective-meta | Enable collective HDF5 metadata operations | bool | false | | ||
| collective-data | Enable collective HDF5 data operations | bool | false | | ||
| subfiling | Enable HDF5 Subfiling Virtual File Driver | bool | false | | ||
| output-csv-name | Name of the output csv file | string | output | | ||
| output-ranks-data | Enable statistics output for each rank | bool | false | | ||
|
||
It should be noted that for each parameter there is a default value that applies if the parameter has not been specified | ||
in the configuration file. Thus, by default the benchmark will not run because the generate-data, train and evaluation | ||
parameters are false. A sample configuration file can be found in the `samples/` directory. | ||
|
||
## Understanding the output | ||
The sample output of the benchmark is as follows: | ||
``` | ||
=================== Performance Results ================== | ||
Total number of ranks: 8 | ||
The number of read threads per rank: 0 | ||
Total training set size: 7.000 GB | ||
Training set size per rank: 896.000 MB | ||
Total training emulated compute time: 3.229 s | ||
Training metadata time: 2.808 s | ||
Training raw read time: 30.905 s | ||
Training average raw read rate: 145.141 MB/s | ||
Observed training completion time: 37.432 s | ||
Observed average training rate: 131.044 MB/s | ||
Training average throughput: 1.871 samples/s | ||
Training throughput standard deviation: 0.037 samples/s | ||
Training average IO: 119.729 MB/s | ||
Training IO standard deviation: 2.379 MB/s | ||
Total evaluation set size: 7.000 GB | ||
Evaluation set size per rank: 896.000 MB | ||
Total evaluation emulated compute time: 3.206 s | ||
Evaluation metadata time: 2.805 s | ||
Evaluation raw read time: 31.699 s | ||
Evaluation average raw read rate: 141.906 MB/s | ||
Observed evaluation completion time: 38.424 s | ||
Observed average evaluation rate: 127.595 MB/s | ||
Evaluation average throughput avg: 1.826 samples/s | ||
Evaluation throughput standard deviation: 0.090 samples/s | ||
Evaluation average IO: 116.883 MB/s | ||
Evaluation IO standard deviation: 5.735 MB/s | ||
=========================================================== | ||
``` | ||
Let's take a closer look at it. First, information about the number of MPI ranks and processes per MPI rank used in the | ||
simulation is output. Then, the same values are used to describe the training and evaluation performance, so for the | ||
sake of reducing redundancy, let us consider only the first half of the results concerning the training process. Total | ||
training set size is calculated as the size of all HDF5 files used for training. Accordingly, the training set size per | ||
rank gives an idea of how much of the load is taken over by one MPI rank. Total training emulated compute time contains | ||
information about the total time spent on compute emulation for all epochs in total, as well as training metadata time | ||
and training raw read time, about which, however, it should be noted that they are not interleaved and measure the time | ||
of execution of `H5Fopen`, `H5Dget_space`, `H5Screate_simple`, `H5Sclose` and `H5Dread` commands respectively. Training | ||
average raw read rate is calculated as training set size per rank divided by training raw read time. Observed training | ||
completion time includes all the time spent on the training process, among other things including resource allocation | ||
and computation simulation. Observed average training rate is equal to training set size per rank divided by the | ||
difference of observed training completion time and total training emulated compute time, thus showing the data reading | ||
rate without taking into account emulation costs. Training average throughput and training throughput standard deviation | ||
give an indication of the number of samples from the training dataset processed in one second. Training average IO and | ||
Training IO standard deviation translate these values into bytes/second by multiplying by the size of one sample. | ||
|
||
## Future work | ||
|
||
There are plans to add more configuration options for the extension in the future to increase its flexibility: | ||
- Add settings for Subfiling VFD. Currently, the default settings are used. | ||
- Add more features from [DLIO Benchmark](https://github.com/argonne-lcf/dlio_benchmark) such as resizable records. | ||
- Analyze and add support for other ml frameworks and data loaders. For example, DALI. | ||
- Add support for prefetching. | ||
- Expand the ability to randomly shuffle samples. At the moment, it is not possible to shuffle only samples in each file | ||
without changing the order of the files for training. | ||
- Add more compression filters and thus support different compression algorithms for HDF5 data. | ||
- Add support for drop_last customization. Currently, by default, all batches left after MPI ranks distribution are not processed. | ||
- Replace the use of `fork()` with `MPI_Comm_spawn()` when creating new processes, as using `fork()` with MPI may be unsafe | ||
jeanbez marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Test support for the Cache VOL connector. | ||
- Add support for checkpointing by saving the model to a hdf5 file. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
metric, value, unit | ||
operation, dlio, | ||
ranks, 16, | ||
read threads, 0, | ||
subfiling, NO, | ||
chunking, NO, | ||
collective meta, YES, | ||
collective data, YES, | ||
train total size, 63.000, GB | ||
train size per rank, 3.938, GB | ||
train emulated compute time per epoch, "2.907, 2.907, 2.907, 2.907, 2.907", "s, s, s, s, s" | ||
train emulated compute time, 14.535, s | ||
train metadata time per epoch, "0.559, 0.502, 0.434, 0.406, 0.482", "s, s, s, s, s" | ||
train metadata time, 2.383, s | ||
train raw read time per epoch, "4.205, 3.835, 3.767, 3.758, 3.830", "s, s, s, s, s" | ||
train total raw read time, 19.395, s | ||
train raw read rate per epoch, "958.749, 1.026, 1.045, 1.047, 1.027", "MB/s, GB/s, GB/s, GB/s, GB/s" | ||
train avg raw read rate, 1.017, GB/s | ||
train observed time per epoch, "8.709, 8.125, 7.789, 7.824, 8.775", "s, s, s, s, s" | ||
train observed time, 41.223, s | ||
train observed rate per epoch, "694.948, 772.636, 825.826, 820.059, 687.088", "MB/s, MB/s, MB/s, MB/s, MB/s" | ||
train avg observed rate, 760.111, MB/s | ||
train throughput samples per second per epoch, "7.234, 7.753, 8.088, 8.052, 7.179", "samples/s, samples/s, samples/s, samples/s, samples/s" | ||
train throughput avg samples per second, 7.661, samples/s | ||
train throughput stdev samples per second, 0.389, samples/s | ||
train io avg, 490.330, MB/s | ||
train io stdev, 24.925, MB/s | ||
eval total size, 16.000, GB | ||
eval size per rank, 1.000, GB | ||
eval emulated compute time per epoch, "2.584, 2.584, 2.584, 2.584, 2.584", "s, s, s, s, s" | ||
eval emulated compute time, 12.920, s | ||
eval metadata time per epoch, "0.214, 0.151, 0.162, 0.141, 0.181", "s, s, s, s, s" | ||
eval metadata time, 0.848, s | ||
eval raw read time per epoch, "0.925, 0.913, 0.875, 0.853, 0.824", "s, s, s, s, s" | ||
eval total raw read time, 4.390, s | ||
eval raw read rate per epoch, "1.080, 1.095, 1.143, 1.171, 1.213", "GB/s, GB/s, GB/s, GB/s, GB/s" | ||
eval avg raw read rate, 1.141, GB/s | ||
eval observed time per epoch, "4.120, 3.904, 3.881, 3.867, 3.940", "s, s, s, s, s" | ||
eval observed time, 19.712, s | ||
eval observed rate per epoch, "666.574, 775.895, 789.646, 797.822, 755.279", "MB/s, MB/s, MB/s, MB/s, MB/s" | ||
eval avg observed rate, 757.043, MB/s | ||
eval throughput samples per second per epoch, "3.883, 4.099, 4.123, 4.137, 4.061", "samples/s, samples/s, samples/s, samples/s, samples/s" | ||
eval throughput avg samples per second, 4.061, samples/s | ||
eval throughput stdev samples per second, 0.092, samples/s | ||
eval io avg, 259.877, MB/s | ||
eval io stdev, 5.907, MB/s |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.