layout | title | nav_order |
---|---|---|
default |
V.Explore GPGPU-SIM and GEMM |
7 |
{: .no_toc }
{: .no_toc .text-delta }
- TOC {:toc}
{: .outline}
At this part, you will tune GEMM and learn the basic use of GPGPU-SIM
GPGPU-SIM is a simulator for CUDA program. GPGPU-SIM is a little outdated from gem5. But it is still acknowledged by academic field.
You can choose either of two ways below to prepare environment for building.
sudo apt-get install -y wget build-essential xutils-dev bison zlib1g-dev flex \
libglu1-mesa-dev git g++ libssl-dev libxml2-dev libboost-all-dev git g++ \
libxml2-dev vim python-setuptools python-dev build-essential python-pip
pip3 install pyyaml plotly psutil
wget http://developer.download.nvidia.com/compute/cuda/11.0.1/local_installers/cuda_11.0.1_450.36.06_linux.run
sh cuda_11.0.1_450.36.06_linux.run --silent --toolkit
rm cuda_11.0.1_450.36.06_linux.run
To get docker image
docker pull accelsim/ubuntu-18.04_cuda-11
To get GPGPU-SIM
git clone git@github.com:accel-sim/gpgpu-sim_distribution.git
To build
# at <gpgpu-sim dir>
source setup_environment
make -j
The following steps are all necessary.
{: .highlight}
you should add
-lcudart
flag when you use nvcc to compile
nvcc -lcudart <source-file> -o <binary-file>
# at <gpgpu-sim dir>
. setup_environment
First, choose a config you like from <gpgpu-sim dir>
/configs/tested-cfgs.
Copy all the files under <gpgpu-sim dir>
/configs/tested-cfgs/<selected configs>
to the path where the binary file lies.Then go to the path where the binary file lies and just run it.
General Matrix Multiply (GEMM) is a common algorithm in linear algebra, machine learning, statistics, and many other domains. It provides a more interesting trade-off space, as there are many ways to break up the computation. This includes using blocking, inner products, outer products, and systolic array techniques.
At this part of LAB, we provide a GEMM template code of CUDA, your task is as follows:
- simulate GEMM template code in GPGPU-SIM and find out the weakness of it
{: .challenge}
you can do whatever you want with the code except the basic test frame in order to improve the performance of the GEMM
{: .highlight}
Hint
you can simulate the modified code in GPGPU-SIM to validate the improvement of performance.
{: .question}
a. What parameters do you think should be used to evaluate GEMM performance? Why? (Try to look through the simulation output)