NAS Parallel Benchmarks for GPUs

This is a repository aimed at providing GPU parallel codes with different parallel APIs for the NAS Parallel Benchmarks (NPB) from a C/C++ version (NPB-CPP). The parallel versions support GPUs from NVIDIA, AMD, and Intel. You can also contribute with this project, writing issues and pull requests. 😄

🔉News: CUDA versions for pseudo-applications added and IS improved. 📅11/Feb/2021

🔉News: Parametrization support for configuring number of threads per block and CUDA parallelism optimizations. 📅25/Jul/2021

🔉News: Paper published in the journal Software: Practice and Experience (SPE). 📅29/Nov/2021

🔉News: A new GPU parallel implementation is now available using the GSParLib API. 📅15/Aug/2024

🔉News: A new GPU parallel implementation is now available using HIP (supporting GPUs from NVIDIA, AMD, and Intel). 📅30/Jan/2025

How to cite our work 👍

DOI - Araujo, G.; Griebler, D.; Rockenbach, D. A.; Danelutto, M.; Fernandes, L. G.; NAS Parallel Benchmarks with CUDA and beyond, Software: Practice and Experience (SPE), 2021.

DOI - Araujo, G.; Griebler, D.; Danelutto, M.; Fernandes, L. G.; Efficient NAS Benchmark Kernels with CUDA. 28th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP), Västerås, 2020.

The NPB with GPUs

The parallel GPU versions were implemented from the serial version of NPB-CPP.

==================================================================

NAS Parallel Benchmarks for GPUs code contributors are:

Dalvan Griebler: dalvan.griebler@pucrs.br

Gabriell Araujo: gabriell.araujo@edu.pucrs.br

==================================================================

Each directory is independent and contains its own implemented version:

Five kernels

IS - Integer Sort
EP - Embarrassingly Parallel
CG - Conjugate Gradient
MG - Multi-Grid
FT - discrete 3D fast Fourier Transform

Three pseudo-application

SP - Scalar Penta-diagonal solver
BT - Block Tri-diagonal solver
LU - Lower-Upper Gauss-Seidel solver

Software Requiriments

Warning: our tests were made with GCC, NVCC, HIPCC, and ChipStar

How to Compile

Go inside the directory of the target API (e.g., CUDA, HIP, or GSPar)
Update the file config/make.def informing the path to the GPU compiler
If you choose HIP, also update the file config/make.def exporting the vendor of the GPU you want to use (e.g., nvidia, amd, or intel):
```
export HIP_PLATFORM=amd
```

Execute the following commands to compile and execute the NPB programs:

make _BENCHMARK CLASS=_VERSION

_BENCHMARKs are:

 CG, EP, FT, IS, MG, BT, LU, and SP

_VERSIONs are:

 + Class S: small for quick test purposes

 + Class W: workstation size (a 90's workstation; now likely too small)

 + Classes A, B, C: standard test problems; ~4X size increase going from one class to the next

 + Classes D, E, F: large test problems; ~16X size increase from each of the previous Classes

Command example to compile:

make ep CLASS=B

Command example to run:

bin/ep.B

Activating the additional timers

NPB-GPU has additional timers for profiling purpose. To activate these timers, create a dummy file 'timer.flag' in the main directory of the NPB version (e.g. CUDA/timer.flag).

Configuring the number of threads per block

NPB-GPU allows configuring the number of threads per block of each GPU kernel in the benchmarks. The user can specify the number of threads per block by editing the file gpu.config in the directory /config/. If no file is specified, all GPU kernels are executed using the warp size of the GPU as the number of threads per block.

Syntax of the gpu.config file:

<benchmark-name>_THREADS_PER_BLOCK_<gpu-kernel-name> = <interger-value>

Configuring CG benchmark as example:

CG_THREADS_PER_BLOCK_ON_KERNEL_ONE = 32
CG_THREADS_PER_BLOCK_ON_KERNEL_TWO = 128
CG_THREADS_PER_BLOCK_ON_KERNEL_THREE = 64
CG_THREADS_PER_BLOCK_ON_KERNEL_FOUR = 256
CG_THREADS_PER_BLOCK_ON_KERNEL_FIVE = 32
CG_THREADS_PER_BLOCK_ON_KERNEL_SIX = 64
CG_THREADS_PER_BLOCK_ON_KERNEL_SEVEN = 128
CG_THREADS_PER_BLOCK_ON_KERNEL_EIGHT = 64
CG_THREADS_PER_BLOCK_ON_KERNEL_NINE = 512
CG_THREADS_PER_BLOCK_ON_KERNEL_TEN = 512
CG_THREADS_PER_BLOCK_ON_KERNEL_ELEVEN = 1024

The NPB-GPU also allows changing the GPU device by providing the following syntax in the gpu.config file:

GPU_DEVICE = <interger-value>

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
CUDA		CUDA
GSPar		GSPar
HIP		HIP
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NAS Parallel Benchmarks for GPUs

How to cite our work 👍

The NPB with GPUs

Software Requiriments

How to Compile

Activating the additional timers

Configuring the number of threads per block

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

GMAP/NPB-GPU

Folders and files

Latest commit

History

Repository files navigation

NAS Parallel Benchmarks for GPUs

How to cite our work 👍

The NPB with GPUs

Software Requiriments

How to Compile

Activating the additional timers

Configuring the number of threads per block

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages