Skip to content

duongnphong/Conv2D-NumPy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NumPy-Conv2D

This repository provides an implementation of a Conv2D (2D convolutional layer) from scratch using NumPy. It is designed to be beginner-friendly, making it easy for newcomers to deep learning to understand the underlying concepts of convolutional neural networks. By leveraging the power of NumPy, this implementation offers an accessible entry point for those interested in studying and experimenting with convolutional layers.

Features

  • 2D convolutional layer implementation
  • Support for both single-channel and multi-channel images/feature maps
  • Customizable filter size, stride, and padding
  • Efficient computation using NumPy

Dependencies and Installation

  • Python 3.11.2

  • Install requirements: pip install -r requirements.txt

Implementation

def conv2d(
    image: np.ndarray,
    in_channels: int,
    out_channels: int,
    kernel_size,
    stride=1,
    padding=0,
) -> np.ndarray:
    """python
    Perform a 2D convolution operation.

    Args:
        image (np.ndarray): Input image.
        in_channels (int): Number of input channels.
        out_channels (int): Number of output channels.
        kernel_size (int or tuple[int, int]): Size of the convolutional kernel.
        stride (int, optional): Stride value for the convolution operation. Default is 1.
        padding (int, optional): Padding value for the input image. Default is 0.

    Returns:
        np.ndarray: Resulting output of the convolution operation.

    Raises:
        TypeError: If `image` is not of type `numpy.ndarray`.
        TypeError: If `in_channels` is not of type `int`.
        TypeError: If `out_channels` is not of type `int`.
        ValueError: If `kernel_size` is invalid.

    """

Understanding 2D Convolution

Terminology

Terms Explainations Variables
input An image of size (height, width, channels) represents a single instance of an image. It can be thought of as a collection of channels 2D matrices, each of size (height, width), stacked together. in_channel=channels
padding Technique of adding extra border elements to the input data before applying a convolution operation. It helps preserve spatial dimensions and prevents the output from being smaller than the input. padding
kernel A kernel, in the form of a 2D matrix of weights, is a small filter typically sized as (3, 3), (5, 5), or (7, 7). It plays a crucial role in the convolutional layer by learning and extracting features from the input data. The kernel is convolved over the input with a specified stride, and at each position, the convolution operation is performed. The number of kernel matrices is equivalent to the number of output channels. kernel_size, stride
convolution The main operation in a 2D Convolution, but is is technically cross correlation. Mathematically convolution and cross correlation is similar operations as they are both performing element-wise dot product (multiplication and summation) between a kernel and a receptive field of the input.
bias A set of 1D vectors of size output_channels representing the bias terms. Each intermidiate outputs of each covolution operation on each channels of the input are summed together and added a bias to introduce an offset or shift in the output (feature maps).
ouput Also called feature maps is the result obtained after applying convolutional operations to the input data. Each feature map represents a specific learned feature or pattern detected by the convolutional layer. out_channels

Output Sizing Calculation

#Output height and width are calculated 
output_height = (input_height - kernel_height + 2 * padding) / stride + 1
output_width = (input_width - kernel_width + 2 * padding) / stride + 1

Examples of an intermediate output

Input layer is the Red of the input. Arguments kernel_size = 3, stride = 1, padding = 0

conv1d

Results of 2D Convolution

2 conv2d layers 4 conv2d layers
kernel_size = 3, stride = 1, padding = 0 kernel_size = 3, stride = 1, padding = 1
low high

More on this topic