English | 简体中文
This is a practical CUDA programming tutorial designed to help readers master the basic concepts and common operations of CUDA parallel computing through hands-on exercises. The content covers fundamental operations such as vector addition, matrix operations, convolution, and parallel reduction, deepening the understanding of GPU parallel acceleration through practice.
First, ensure your computer/server has an available Nvidia GPU, then download and install the CUDA Toolkit and the corresponding driver from the Nvidia official website. For installation instructions, refer to the CUDA Quick Start Guide.
Clone this repository:
git clone https://github.com/youxam/cuda-practice-tutorial.git
cd cuda-practice-tutorial
Generate a working directory:
python3 generate.py [path]
For example:
python3 generate.py ~/cuda-practice-projects
The generated directory contains 9 exercises, each with a corresponding README.md
that includes a complete tutorial, problem description, and explanations. You can also read them online.
- Problem 1: Vector Addition
- Problem 2: SAXPY
- Problem 3: 1D Stencil
- Problem 4: Matrix Transposition
- Problem 5: Parallel Reduction Sum
- Problem 6: 2D Convolution
- Problem 7: Tiled Matrix Multiplication
- Problem 8: Histogram
- Problem 9: K-means Clustering
You should complete each problem by following these steps:
- Read the problem description and requirements to understand the functionality you need to implement.
- Based on the requirements, implement the
// TODO
sections of thestudent.cu
file. - Use
make list
to view the list of test cases. Usemake run TC=<test_case_prefix>
to compile and run a specific test case. Usemake test
to compile and run all test cases. - If you encounter difficulties, refer to the
answer.cu
file for a reference implementation to understand the approach and how it is implemented.
If you need to configure LSP, you can refer to the .clangd
file in this project.
- CUDA C Programming Guide
- CUDA Programming Basics and Practice by Zheyong Fan (Chinese)