A simple implementation of a Neural Network library written in C, accelerated with NVIDIA's CUDA for matrix operations.
This library focuses on demonstrating the performance benefits of using CUDA for parallelizing matrix operations.
The library provides fundamental components for building and training simple, fully-connected neural networks.
- Core Components: Basic building blocks like matrices, layers, and activation functions.
- CUDA Acceleration: Key matrix operations are offloaded to the GPU using custom CUDA kernels for significant speed-up.
- Simple API: A straightforward interface for creating, training, and evaluating neural networks.
- Example Implementation: Includes an example of a network that learns the XOR function.
.
├── CMakeLists.txt
├── README.md
└── src
├── cuda_kernels.cu # CUDA kernels for GPU-based matrix operations
├── cuda_kernels.h # Headers for the CUDA kernels
├── main.c # Benchmark program
├── matrix.c # matrix functions
├── matrix.h # Headers for matrix functions
├── nn.c # Core neural network logic
├── nn.h # Headers for the neural network
└── xor.c # Example: Training a network to solve XOR
- A C Compiler (e.g., GCC, Clang)
- CMake (version 3.18 or higher)
- NVIDIA CUDA Toolkit (for GPU acceleration)
-
Clone the repository:
git clone <repository-url> cd nn.c-uda
-
Create a build directory:
mkdir build cd build -
Run CMake and build the project:
cmake .. cmake --build .
After building, two executables will be available in the build directory (or a subdirectory like build/Debug on Windows).
-
Run the XOR Example: This program trains a small network to solve the XOR problem.
./nn_c_uda_xor
-
Run the Performance Benchmark: This program trains a larger network on a big batch of random data, first using the GPU and then using only the CPU, to compare the time taken.
./nn_c_uda_benchmark