This repo contains some artifacts related to Systemic Assessment of Node Failures in HPC Production Platforms paper. For further details, please feel free to refer to the related IPDPS 2021 paper.
Sample HPC logs related to this study has been released here.
The following citation can be used if found relevant:
@inproceedings{DBLP:conf/ipps/Das0R21,
author = {Anwesha Das and
Frank Mueller and
Barry Rountree},
title = {Systemic Assessment of Node Failures in {HPC} Production Platforms},
booktitle = {International Parallel and Distributed Processing Symposium,
{IPDPS}},
pages = {267--276},
publisher = {{IEEE}},
year = {2021}
}