Neural Networks in C

Low-level neural network implementation in C and POSIX threads parallelism.

Overview

This project implements from scratch:

Single Layer Perceptron (SLP)
Multi-Layer Perceptron (MLP)
Explicit backpropagation
Custom BLAS-like linear algebra core
Parallelization using POSIX threads (pthreads)

The goal is not to build a framework, is to understand how neural networks operate at a low level: memory layout, numerical computation, and parallel execution.

Roadmap

Phase 1: Numerical Foundations

Phase 2: Low-Level Parallelization

Row partitioning strategy using pthreads
Parallel matrix-vector multiplication & matrix-matrix multiplication
Parallel batch operations
Persistent Thread Pool (Minimalist BLAS-style)
Speedup benchmarking (Sequential vs Parallel tool)

Phase 3: NN Infrastructure

Activation functions & Derivatives
Loss functions
Weight Initialization (Xavier/He)

Phase 4: SLP (Single Layer Perceptron)

4.1 Forward: y = activation(Wx + b)
4.2 Explicit Backward: dW = dL/dy * x^T, db = dL/dy
4.3 Training Loop: Forward -> Loss -> Backward -> Update
4.4 Validation: Data linearly separable convergence

Phase 5: MLP (Multi Layer Perceptron)

5.1 Structure: Multi-layer model representation
5.2 Layer Caching: Storing Z and A states for backprop
5.3 General Backpropagation: Iterative chain rule implementation
5.4 Numerical Gradient Checking: Comparative validation

Phase 6: Batch Processing

6.1 Vectorized input (Batch x Features)
6.2 Batched Matmul optimization
6.3 Aggregated gradient calculation

Phase 7: Parallel Batch Training

7.1 Thread-level batch partitioning
7.2 Gradient reduction buffers and synchronization

Phase 8 & 9: Performance Engineering

Persistent Thread Pool implementation (eliminates pthread overhead)
Worker synchronization via Condition Variables
9.1 Cache-aware optimization (Loop Tiling / Blocking)
9.2 Memory alignment & Branch avoidance

Phase 10: Robustness & Tooling

10.1 Data loader (CSV/Binary)
10.2 Integration tests (MLP training)
10.3 CLI for hyperparameter tuning
10.4 Shape assertions & Debug mode

Phase 11: Extensions

Advanced Optimizers (Adam, Momentum)
Regularization (L2, Dropout)
Model serialization (Save/Load binaries)

Design Principles

C
Row-major memory layout
Manual memory management
Explicit parallelism (pthreads)
No hidden abstractions

This project prioritizes clarity of execution over abstraction.

Project Structure

Neural-Networks-in-C
├── assets/
├── include/
│   ├── matrix.h
│   ├── parallel.h
│   ├── linalg.h
│   ├── runtime.h
│   ├── thread_pool.h
│   └── nn_infra.h
├── src/
│   ├── parallel/
│   ├── matrix.c
│   ├── linalg.c
│   ├── runtime.c
│   ├── thread_pool.c
│   └── nn_infra.c
├── tests/
│   └── test_*.c
├── build/ # compiled binaries
├── main.c # soon
├── run_valgrind.sh
└── Makefile

Performance Note: Thread Pool

The project uses a persistent Thread Pool (BLAS-style) to manage parallel tasks. Unlike a naive approach where threads are created and joined on every operation, this implementation spawns workers once at startup (runtime_init) and signals work using condition variables. This eliminates the significant overhead of pthread_create/pthread_join syscalls, making small-to-medium matrix operations much more efficient.

Build

make

Compile all tests:

make test

run:

./build/test_matmul
./build/test_matvec
./build/test_performance

Test Memory Leak

To use this feature, you need to install valgrind:

sudo apt-get install valgrind

You need to give permission to run the script:

chmod +x run_valgrind.sh

Then run:

./run_valgrind.sh test_runtime

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
docs		docs
include		include
src		src
tests		tests
valgrind_report		valgrind_report
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
run_valgrind.sh		run_valgrind.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Networks in C

Overview

Roadmap

Phase 1: Numerical Foundations

Phase 2: Low-Level Parallelization

Phase 3: NN Infrastructure

Phase 4: SLP (Single Layer Perceptron)

Phase 5: MLP (Multi Layer Perceptron)

Phase 6: Batch Processing

Phase 7: Parallel Batch Training

Phase 8 & 9: Performance Engineering

Phase 10: Robustness & Tooling

Phase 11: Extensions

Design Principles

Project Structure

Performance Note: Thread Pool

Build

Test Memory Leak

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Neural Networks in C

Overview

Roadmap

Phase 1: Numerical Foundations

Phase 2: Low-Level Parallelization

Phase 3: NN Infrastructure

Phase 4: SLP (Single Layer Perceptron)

Phase 5: MLP (Multi Layer Perceptron)

Phase 6: Batch Processing

Phase 7: Parallel Batch Training

Phase 8 & 9: Performance Engineering

Phase 10: Robustness & Tooling

Phase 11: Extensions

Design Principles

Project Structure

Performance Note: Thread Pool

Build

Test Memory Leak

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages