Skip to content

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

License

Notifications You must be signed in to change notification settings

Umer-Farooq-CS/MNIST-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Neural Network Acceleration on GPUs

Project Overview

This project focuses on accelerating a neural network implementation for the MNIST classification task using GPU programming with CUDA. We begin with a sequential CPU implementation (V1) and progressively optimize it to maximize performance on the GPU (V4). The key goal is to gain hands-on experience in parallel computing, high-performance computing (HPC), and CUDA optimizations.

Repository Structure

├── src
│   ├── V1  # Baseline sequential implementation
│   ├── V2  # Naive GPU implementation
│   ├── V3  # Optimized GPU implementation with performance improvements
│   ├── V4  # Optimized GPU implementation utilizing tensor cores
│   ├── V5  # Optimized GPU implementation using OpenACC
├── data    # Contains the MNIST dataset
├── report  # Project report
├── slides  # Presentation slides
├── README.md  # Project documentation and instructions

Prerequisites

  • NVIDIA GPU with CUDA support
  • CUDA Toolkit installed
  • nvcc compiler available
  • make utility installed

Compilation and Execution

Compilation

Navigate to the src directory and run:

make

This will compile the project and generate an executable located in build/nn.exe.

Running the Program

To execute the program, run:

make run

This will execute the compiled neural network and move profiling data if available.

Profiling and Speedup Execution

To run the profiling version:

make prof-run
make nsight-analyze
make speedup

This generates profiling data for performance analysis.

Cleaning Build Files

To remove all compiled files and reset the build directory:

make clean

Code Structure

  • main.cu: Entry point for the neural network execution.
  • neural_net.cu: Core implementation of the neural network.
  • utils.cu: Utility functions for matrix operations and timers.
  • mnist.cu: MNIST dataset handling functions.
  • nn.h: Header file defining neural network parameters.
  • utils.h: Header file defining helper functions for matrix operations and timing.
  • speedup_analysis.c: Compares all versions and gives speedup analysis.

Optimization Strategy

Each version of the project applies different optimization techniques:

V1 (Baseline CPU Implementation)

  • Sequential execution on CPU.
  • No parallelism or GPU acceleration.

V2 (Naive GPU Implementation)

  • Converts matrix operations to CUDA kernels.
  • Parallel execution but lacks optimizations.

V3 (Optimized GPU Implementation)

  • Optimized kernel launch configuration.
  • Improved occupancy and memory usage.
  • Reduced communication overhead.
  • Efficient memory hierarchy utilization.
  • Utilized Cuda Streams
  • Utilized Pinned Memory
  • Initialization shifted to kernel side
  • Combined multiple small kernels
  • Utilized Shared Memory
  • Used Optimized Compiler Flags

V4 (Tensor Core Optimization)

  • Utilizes Tensor Cores for matrix multiplications.
  • Further speedup through specialized CUDA libraries.

V5: (OpenACC Implementation)

  • Directive-based parallelism.
  • Quick porting, hardware abstraction.

Authors

  • Umer Farooq
  • Muhammad Irtaza Khan

Github Repository

https://github.com/Umer-Farooq-CS/MNIST-Classification.git

About

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •