GPU Performance Optimization for Image Processing Algorithms

Overview

This project demonstrates high-performance implementations of two fundamental image processing algorithms—Gaussian Blur and Sobel Edge Detection—using both CPU and GPU (CUDA) approaches. It benchmarks and visualizes the performance impact of various GPU optimizations, including separable convolution and shared memory tiling.

Gaussian Blur: Used for noise reduction and image smoothing.
Sobel Edge Detection: Used for detecting edges and boundaries in images.

The project provides:

CPU and multiple GPU implementations (naive and optimized)
Automated benchmarking and CSV result output
Performance visualization scripts and plots

Features

CPU Baseline: Sequential C++ implementations for both algorithms.
Naive GPU: Direct CUDA kernels with global memory access.
Optimized GPU:
- Gaussian Blur: Separable convolution (reduces O(k²) to O(2k) operations).
- Sobel Edge: Shared memory tiling to minimize redundant global memory reads.
Automated Benchmarking: Batch tests across kernel sizes and block dimensions.
CSV Output: Results saved for further analysis.
Plot Generation: Python script to visualize timing and speedup.

Project Structure

project-root/
│
├── build_and_run.ps1 # Script to build and run the program
├── generate-plots.py # Script to visualize performance data
│
├── data/ # CSV output files
├── images/ # Processed image outputs
├── plots/ # Generated performance plots
├── build/ # Build folder
├── CMakeLists.txt
├── filters_cpu.cpp
├── filters_cpu.h
├── filters_gpu.cu
├── filters_gpu.h
├── image_io.cpp
├── image_io.h
├── main.cpp
├── instructions.md
└── README.md

Setup Instructions

Setting Up CUDA Environment

Create CUDA 12.4 Runtime Project named gpu_image_processing
Add the above project files to the project.

Installing OpenCV on Windows (for C++/CMake)

1. Download OpenCV

Go to the official OpenCV Releases page.
Download the latest Windows pack (opencv-4.x.x-windows.exe).

Example: opencv-4.12.0-windows.exe

2. Extract the Archive

Run the .exe — it will just unpack the files (not install anything).
Choose a location, e.g. C:\opencv.

After extraction, you’ll have a folder like:

C:\opencv\build
├── x64
│ └── vc15
│ ├── bin # DLLs
│ ├── lib # Libraries (.lib)
│ └── ...
└── include # Headers

3. Set Environment Variables

Open System Properties → Advanced → Environment Variables.
Add a new system variable:
```
OPENCV_DIR = C:\opencv\build
```
Edit your Path variable -> add: > C:\opencv\build\x64\vc16\bin (so Windows can find opencv_worldXXX.dll at runtime).

Building and Running

Run the script
Right-click on the build_and_run.ps1 file in File Explorer and select Run with PowerShell to start the program.
Input prompts
- Enter the image file name to process.
- Select the processing device:
  - cpu
  - gpu
  - all
Results
- A printout of results will be displayed in the terminal.
- Processed data files will be saved in the data/ folder as CSV files.
- Processed images will be saved in the images/ folder.

Visualizing Performance

To visualize performance results:

Run the script:
```
python generate-plots.py
```
Generated plots will be saved in the plots/ folder.

Results Summary

Gaussian Blur

Direct GPU (2D convolution): Up to 600x speedup over CPU for moderate kernel sizes (5x5, 7x7).
Separable GPU: Consistently low execution times and 2000x–3000x speedup for large kernels (15x15).
Direct approach loses efficiency for large kernels due to memory bandwidth limits.

Sobel Edge Detection

Global memory GPU: Up to 600x speedup for balanced block sizes.
Shared memory GPU: Further reduces execution time for small/skewed blocks, but can lose efficiency for large blocks due to occupancy and bank conflicts.
Block shape and memory access patterns significantly affect performance.

Analysis & Insights

Separable convolution for Gaussian blur is critical for large kernels, reducing complexity from O(k²) to O(2k).
Shared memory tiling in Sobel edge detection minimizes redundant global memory reads, but optimal block size and shape are crucial to avoid bank conflicts and occupancy loss.
GPU acceleration provides dramatic speedups for both algorithms, but careful memory and thread management is required for best results.

Limitations & Future Work

Hardware: Tested only on NVIDIA A10 GPU (Ampere, compute 8.6).
Scope: Focused on correct parallel implementation and tuning; did not explore multi-GPU or other architectures.
Complexity: CUDA programming challenges (e.g., shared memory, synchronization) limited further optimizations.

Future directions:

Test on other GPU architectures (e.g., Hopper, Ada Lovelace).
Implement higher-order or multidirectional Sobel filters.
Integrate CUDA kernels into deep learning pipelines (e.g., PyTorch custom ops).
Explore warp-level primitives and advanced memory prefetching.

References

Gonzalez, R. C., Woods, R. E. (2009). Digital Image Processing.
NVIDIA CUDA Best Practices Guide (2025).
Fisher, R. (2003). Gaussian Smoothing.
MoldStud (2025). Sobel Operator.
Harris, M. (2013). Optimizing Parallel Reduction in CUDA.
Li & Pang (2025). Medical Imaging Applications.
Podlozhnyuk, A. (2012). Image Convolution with CUDA.
Koutsantonis, D. (2021). CUDA Memory Optimization.
Additional references in project report.

Acknowledgements

This project was completed as part of DPS921 at Seneca Polytechnic.

For detailed methodology, results, and analysis, see the full project report (dps921-final-project-report-Nadiia-Geras.pdf).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

GPU Performance Optimization for Image Processing Algorithms

Overview

Table of Contents

Features

Project Structure

Setup Instructions

Setting Up CUDA Environment

Installing OpenCV on Windows (for C++/CMake)

1. Download OpenCV

2. Extract the Archive

3. Set Environment Variables

Building and Running

Visualizing Performance

Results Summary

Gaussian Blur

Sobel Edge Detection

Analysis & Insights

Limitations & Future Work

References

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
images		images
plots		plots
report		report
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
build_and_run.ps1		build_and_run.ps1
filters_cpu.cpp		filters_cpu.cpp
filters_cpu.h		filters_cpu.h
filters_gpu.cu		filters_gpu.cu
filters_gpu.h		filters_gpu.h
generate-plots.py		generate-plots.py
image_io.cpp		image_io.cpp
image_io.h		image_io.h
main.cpp		main.cpp

Uh oh!

Uh oh!

kolossi101/gpu-image-processing

Folders and files

Latest commit

History

Repository files navigation

GPU Performance Optimization for Image Processing Algorithms

Overview

Table of Contents

Features

Project Structure

Setup Instructions

Setting Up CUDA Environment

Installing OpenCV on Windows (for C++/CMake)

1. Download OpenCV

2. Extract the Archive

3. Set Environment Variables

Building and Running

Visualizing Performance

Results Summary

Gaussian Blur

Sobel Edge Detection

Analysis & Insights

Limitations & Future Work

References

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages