The goals for this CUDA tutorial are to gain a comprehensive understanding of the CUDA programming model, including SIMT threading, kernel structure, and GPU memory hierarchy, as well as practical knowledge of memory optimization techniques like global memory coalescing and shared memory bank conflicts.
-
Chapter 1: CPU vs GPU Architecture and Performance
- 1. Introduction
- 2. Efficiency of GPUs over CPUs
- 3. Structural Differences: CPU vs GPU
- 4. Streaming Multiprocessors (SMs) in GPUs
- 5. Thread Hierarchy Organization in GPUs
- 6. Two-Level Parallelism Latency hiding in GPUs
- 7. Synchronization in GPU Threads
- 8. Work Division: CPUs vs GPUs
- 9. SIMT vs SIMD
- 10. Limitations of GPUs
- Nvidia Cuda Programming Guide
- CS 4230/6230 (Parallel and High-Performance Computing) Lectures by Professor P. Sadayappan, The University of Utah
By Omid Asudeh