This repository provides an implementation of a Conv2D (2D convolutional layer) from scratch using NumPy. It is designed to be beginner-friendly, making it easy for newcomers to deep learning to understand the underlying concepts of convolutional neural networks. By leveraging the power of NumPy, this implementation offers an accessible entry point for those interested in studying and experimenting with convolutional layers.
- 2D convolutional layer implementation
- Support for both single-channel and multi-channel images/feature maps
- Customizable filter size, stride, and padding
- Efficient computation using NumPy
-
Python 3.11.2
-
Install requirements:
pip install -r requirements.txt
def conv2d(
image: np.ndarray,
in_channels: int,
out_channels: int,
kernel_size,
stride=1,
padding=0,
) -> np.ndarray:
"""python
Perform a 2D convolution operation.
Args:
image (np.ndarray): Input image.
in_channels (int): Number of input channels.
out_channels (int): Number of output channels.
kernel_size (int or tuple[int, int]): Size of the convolutional kernel.
stride (int, optional): Stride value for the convolution operation. Default is 1.
padding (int, optional): Padding value for the input image. Default is 0.
Returns:
np.ndarray: Resulting output of the convolution operation.
Raises:
TypeError: If `image` is not of type `numpy.ndarray`.
TypeError: If `in_channels` is not of type `int`.
TypeError: If `out_channels` is not of type `int`.
ValueError: If `kernel_size` is invalid.
"""
Terms | Explainations | Variables |
---|---|---|
input | An image of size (height , width , channels ) represents a single instance of an image. It can be thought of as a collection of channels 2D matrices, each of size (height , width ), stacked together. |
in_channel =channels |
padding | Technique of adding extra border elements to the input data before applying a convolution operation. It helps preserve spatial dimensions and prevents the output from being smaller than the input. | padding |
kernel | A kernel, in the form of a 2D matrix of weights, is a small filter typically sized as (3, 3), (5, 5), or (7, 7). It plays a crucial role in the convolutional layer by learning and extracting features from the input data. The kernel is convolved over the input with a specified stride, and at each position, the convolution operation is performed. The number of kernel matrices is equivalent to the number of output channels. | kernel_size , stride |
convolution | The main operation in a 2D Convolution, but is is technically cross correlation. Mathematically convolution and cross correlation is similar operations as they are both performing element-wise dot product (multiplication and summation) between a kernel and a receptive field of the input. | |
bias | A set of 1D vectors of size output_channels representing the bias terms. Each intermidiate outputs of each covolution operation on each channels of the input are summed together and added a bias to introduce an offset or shift in the output (feature maps). |
|
ouput | Also called feature maps is the result obtained after applying convolutional operations to the input data. Each feature map represents a specific learned feature or pattern detected by the convolutional layer. | out_channels |
#Output height and width are calculated
output_height = (input_height - kernel_height + 2 * padding) / stride + 1
output_width = (input_width - kernel_width + 2 * padding) / stride + 1
Input layer is the Red of the input
. Arguments kernel_size
= 3, stride
= 1, padding
= 0
2 conv2d layers |
4 conv2d layers |
---|---|
kernel_size = 3, stride = 1, padding = 0 |
kernel_size = 3, stride = 1, padding = 1 |