Performance benchmarks on GPU #249

andigu · 2024-08-26T22:15:12Z

The paper presents an interesting approach with promising results; however, the performance section would benefit from greater specificity regarding the hardware used for the experiments. For instance, it is unclear whether the results were obtained using a particular GPU model, and details such as memory capacity and other relevant hardware specifications are not provided. This information is crucial for reproducibility and for understanding the context of the reported performance.

Additionally, it would be helpful to elaborate on the memory limitations encountered, particularly why the memory saturates at 100 samples. Given that modern GPUs, such as the NVIDIA A100, are equipped with up to 80GB of memory, it seems plausible that they could handle significantly more than 100 images of size 128x128. Discussing this discrepancy could provide valuable insights into the potential limitations of the method or implementation choices that may have impacted memory usage.

ConnorStoneAstro · 2024-08-27T15:56:59Z

This is a great point, we mentioned in the figure caption that we used a V100 GPU, but some discussion on the hardware capabilities would be nice.

Indeed 100 images at 128x128 is not very hard to store even in the V100. However, we did the sampling at 4x upsampling, which adds a factor of 16 to the memory load. At 64 bit floating point data, this means that each operation is using almost 2GB in memory. Between pytorch overhead and intermediate calculations (ie storing FFT transformed images for convolution), we are using almost all the memory of the V100. With an A100 we could easily go past this threshold, however we chose not to use A100s because our goal was to use the performance tests to make a quick discussion about how GPUs work, they have flat performance graphs until hitting their saturation, which is a bit different from a lot of people's experience with performant code and changes how one should use it.

I'll work on an updated version of the text and let you know when it is ready!

ConnorStoneAstro · 2024-09-30T19:59:47Z

Hi @andigu we have added some discussion on the GPU performance and how the memory scaling works. This is great material to add since we hoped this section could be a starting point for new users to understand how to efficiently use GPUs for scientific computing. Now we explain how the runtime remains constant until the GPU saturates and we enter a linear regime. We have re-generated the paper on the JOSS review page, please let me know if you think any further discussion is needed!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance benchmarks on GPU #249

Performance benchmarks on GPU #249

andigu commented Aug 26, 2024

ConnorStoneAstro commented Aug 27, 2024

ConnorStoneAstro commented Sep 30, 2024

Performance benchmarks on GPU #249

Performance benchmarks on GPU #249

Comments

andigu commented Aug 26, 2024

ConnorStoneAstro commented Aug 27, 2024

ConnorStoneAstro commented Sep 30, 2024