Skip to content

HPC ReproHack special edition original proposal

Anna Krystalli edited this page Nov 29, 2021 · 2 revisions

Powerful computational tools and methods are becoming ubiquitous in academic research. However, with this increase in computational power and complexity comes increased responsibility to ensure robustness and reliability of research outputs. Reproducibility, the ability to reproduce reported results from their underlying data and analytical code, is emerging as the minimum requirement for assessing such robustness. To promote reproducibility and provide opportunity for researchers to engage with it in practice we’ve launched the ReproHack initiative, one-day reproducibility hackathons where researchers attempt to reproduce other people’s work from published code and data, normally on their own laptop. It is also an opportunity for researchers to help others learn from their work by submitting their papers, code and data for reproduction and review. A number of these events have now been run and have been so far well-received, successfully beginning to engage the research community with the realities of what it takes to make work reproducible. However, the current format, while appropriate for the time budget and level of interest of most researchers, excludes the examination of the reproducibility of more computationally intensive research.

As such, we propose to develop a Computationally Intensive or HPC ReproHack special edition. Discussions still need to be had as to how much the format would need to be adapted and what computational resources to aim to make available for participants (ie on-premises or cloud sponsorship). This would be dictated by the ultimate goal of the event, ie should we aim to just reproduce the research or could we even attempt to speed the code up? The later would add additional interest and learning opportunities in which NAG could be an invaluable partner. The format would require modifying as speeding up the code would require deeper engagement with the materials, unlikely to be feasible in the day. Two models that could be explored are:

the GPU Hackathon organised by ex-Sheffield RSE colleague, now NVIDIA GPU advocate, Mozhgan Kabiri Chimeh in collaboration with NVIDIA which lasted 5 days in which participants were collocated for the duration of the event and teams worked alongside mentors with GPU-programming expertise to accelerate their scientific codes using GPUs.

The 10-year reproducibility challenge in which researchers attempt to re-run code associated with a paper published before 2010 using modern hardware/software. They work with their own code in their own time, recording their experiences and submitting them as a report for publication as a special edition in ReScience C. The challenge will culminate in a workshop where the results will be presented.

Interest in such an event already: https://twitter.com/ARC_DU/status/1219623154282307587?s=20

General ReproHack benefits:

Benefits to participants:

  • Experience in practical reproducibility and the opportunity to explore different strategies, tools and pitfalls to reproducibility they can implement in their own work
  • Experience and inspiration from working with other people’s code and data.
  • An appreciation that reproducibility is non trivial but that opening up their work for more people to engage with is the best way to help improve it.
  • An appreciation that, at its core, reproduction is social, that it benefits the whole research community and is therefore a community effort.

Benefits to authors:

  • Receive useful feedback on the reproducibility of their work and an opportunity even to correct aspects of it should effort to reproduce the work fail.
  • Receive appreciation for their efforts in making their work reproducible which can act as an encouragement to continue. An opportunity to engage others with their research.
Clone this wiki locally