A pure NumPy implementation of a feedforward neural network with support for multiple activation functions, designed for educational purposes and deep learning fundamentals.
- Pure NumPy Implementation: Built from scratch without high-level ML frameworks
- Multiple Activation Functions: Supports ReLU, Sigmoid, Tanh, and Softmax
- Flexible Architecture: Configurable number of layers and neurons
- MNIST Ready: Includes data preprocessing and visualization utilities
- Training Metrics: Tracks loss and accuracy during training
- Mini-batch Training: Efficient batch processing with shuffling
# Load and preprocess MNIST data
X_train, Y_train, X_test, Y_test = load_mnist_data()
# Create network: 784 inputs → 128 hidden (ReLU) → 64 hidden (ReLU) → 10 outputs
network = FFNN(
layer_sizes=[784, 128, 64, 10],
activations=["relu", "relu"] # Hidden layer activations only
)
# Train the network
history = network.train(
X_train, Y_train, X_test, Y_test,
learning_rate=0.01,
num_epochs=50,
batch_size=32
)
# Visualize training progress
plot_training_history(history)pip install numpy matplotlib tensorflow seabornThe main neural network class with the following key methods:
- layer_sizes: List of integers defining network architecture (e.g., [784, 128, 10])
- activations: List of activation functions for hidden layers (output uses softmax)
Trains the network using mini-batch gradient descent with the following parameters:
- learning_rate: Step size for parameter updates (typically 0.001-0.1)
- num_epochs: Number of complete passes through training data
- batch_size: Number of samples per mini-batch (default: 32)
| Function | Use Case | Formula |
|---|---|---|
| ReLU | Hidden layers | max(0, x) |
| Sigmoid | Hidden layers | 1/(1 + e^(-x)) |
| Tanh | Hidden layers | tanh(x) |
| Softmax | Output layer (automatic) | e^x / Σe^x |
Uses He initialization for optimal gradient flow:
W = np.random.randn(n_out, n_in) * sqrt(2.0 / n_in)Cross-entropy loss with numerical stability:
loss = -Σ(y_true * log(y_pred + ε)) / mImplements backpropagation with activation-specific derivatives:
- ReLU:
dZ = dA * (Z > 0) - Sigmoid:
dZ = dA * A * (1 - A) - Tanh:
dZ = dA * (1 - A²)
# Simple 3-layer network
network = FFNN([784, 64, 10], ["relu"])
history = network.train(
X_train, Y_train, X_test, Y_test,
learning_rate=0.01,
num_epochs=30
)# Deeper network with mixed activations
network = FFNN(
layer_sizes=[784, 256, 128, 64, 10],
activations=["relu", "tanh", "relu"]
)# Forward pass for predictions
predictions, _ = network.forward_propagation(X_test)
predicted_classes = np.argmax(predictions, axis=0)Typical results on MNIST (784→128→64→10 architecture):
- Training Accuracy: ~98-99%
- Test Accuracy: ~96-97%
- Training Time: ~2-3 minutes (50 epochs)
- X: Shape
(n_features, n_samples)- Features in rows, samples in columns - Y: Shape
(n_classes, n_samples)- One-hot encoded labels
The load_mnist_data() function automatically handles:
- Reshaping images from 28×28 to 784×1 vectors
- Normalization to [0,1] range
- One-hot encoding of labels
- Proper matrix transposition
The plot_training_history() function creates dual plots showing:
- Training Loss: Cross-entropy loss over epochs
- Accuracy Comparison: Training vs test accuracy curves
ffnn.py # Main implementation
├── FFNN # Neural network class
├── load_mnist_data() # Data preprocessing
└── plot_training_history() # Visualization utilities
This implementation is ideal for understanding:
- Forward Propagation: How data flows through network layers
- Backpropagation: Gradient computation and error propagation
- Activation Functions: Impact of different non-linearities
- Weight Initialization: Importance of proper parameter setup
- Mini-batch Training: Efficient stochastic gradient descent
- Performance: Slower than optimized frameworks (TensorFlow, PyTorch)
- GPU Support: CPU-only implementation
- Advanced Features: No regularization, dropout, or adaptive optimizers
- Memory: Stores all intermediate values for gradient computation
This is an educational implementation. For production use, consider:
- Adding regularization techniques (L1/L2, dropout)
- Implementing adaptive optimizers (Adam, RMSprop)
- Adding batch normalization
- GPU acceleration with CuPy or similar
Open source - feel free to use for educational purposes.
Built for learning deep learning fundamentals through hands-on implementation.