This repository contains the solution for the Image Restoration task presented at the IOAI TST in Kazakhstan. The goal is to restore original images from their filtered versions, where a specific 2x2 pixel-wise filter has been applied. The solution utilizes a custom U-Net architecture with residual blocks and Squeeze-and-Excitation (SE) modules, and incorporates information about the applied filter directly into the network.
- Problem Description
- Solution Overview
- Dataset
- Model Architecture
- Loss Function
- Training
- Inference
- Results
- Dependencies
- Usage
The challenge involves restoring images that have been corrupted by one of several predefined 2x2 pixel-wise filters. Each filter selectively retains a specific color channel (Red, Green, or Blue) for each of the four pixels within a 2x2 block. The task is to predict the original, unfiltered image.
The solution consists of the following key components:
- Custom 2x2 Filter Application (
apply_fast_filter): A function to simulate the filtering process based on a given pattern. - Filter Detection (
detect_filter): A method to identify which of the 12 possible filters was applied to a given image. This is crucial for guiding the restoration process. - Custom Dataset (
FilteredRestoreDataset): A PyTorchDatasetthat generates pairs of filtered and original images for training, along with a one-hot encoded representation of the applied filter. - Custom U-Net Model (
CustomUNet): A U-Net based convolutional neural network designed for image-to-image translation tasks. This model is augmented to take the filter information as an additional input, allowing it to adapt its restoration based on the specific degradation. - Combined Loss Function (
CombinedLossNoPretrained): A custom loss function that combines Mean Squared Error (MSE) and Structural Similarity Index Measure (SSIM) to optimize for both pixel-wise accuracy and perceptual quality. - Training Loop: Standard PyTorch training loop for optimizing the model.
- Inference Pipeline: A process for loading filtered test images, detecting the applied filter, and using the trained model to restore them.
- Submission Generation: Formatting the restored images into the required submission format.
The dataset consists of real_images (original images) and filtered_images (images processed by one of the 12 filters). The FilteredRestoreDataset dynamically applies each of the 12 filters to the original images to create a diverse training set.
Dataset Structure:
/kaggle/input/tst-day-1/
├── train/
│ └── train/
│ └── real_images/
│ ├── img_0001.png
│ ├── ...
│ └── filtered_images/ (Not directly used in training, filters are applied on the fly)
├── test/
│ └── test/
│ └── filtered_images/
│ ├── img_test_0001.png
│ ├── ...
└── sample_submission.csv
The core of the solution is a CustomUNet model.
- Encoder-Decoder Structure: Follows the typical U-Net architecture with downsampling (encoder) and upsampling (decoder) paths, connected by skip connections.
- Residual Blocks: Each convolutional block in the encoder and decoder is a
ResidualBlockwhich helps in training deeper networks by alleviating the vanishing gradient problem. - Squeeze-and-Excitation (SE) Blocks: Each
ResidualBlockincorporates anSEBlock, which adaptively recalibrates channel-wise feature responses by explicitly modeling interdependencies between channels. - Filter Integration: A crucial aspect is the integration of the filter information. The
filter_id(representing which of the 12 filters was applied) is one-hot encoded and passed through a linear layer (filter_fc). The resulting feature vector is then expanded and concatenated with the feature maps at each skip connection in the decoder, allowing the model to condition its restoration on the specific filter.
The CombinedLossNoPretrained is used for training:
Where:
-
$L_{MSE}$ is the Mean Squared Error between the predicted and original images. -
$L_{SSIM}$ is the Structural Similarity Index Measure. -
$\alpha$ and$\beta$ are hyperparameters (weights) for MSE and SSIM components, respectively (set to 0.7 and 0.3 in this solution).
This combined loss encourages both pixel-accurate restoration and perceptually pleasing results.
The model is trained using the Adam optimizer with a learning rate of
Training Details:
- Epochs: 20
- Optimizer: Adam
-
Learning Rate:
$1e-3$ - Batch Size: 32
-
Image Size: Resized to
$128 \times 128$ for training and inference.
The inference process involves:
- Loading a filtered image from the test set.
- Converting the image to a NumPy array to facilitate filter detection.
-
Detecting the filter: The
detect_filterfunction attempts to identify the specific 2x2 filter applied to the image by applying each known pattern and checking for equality with the input. - Converting the filtered image to a PyTorch tensor and creating a one-hot encoded vector for the detected filter.
- Passing the filtered image tensor and the one-hot filter vector to the trained
CustomUNetmodel. - The model outputs the restored image.
- The restored image is then converted back to a PIL Image and saved.
- Finally, the pixel values of the restored image are flattened and inserted into the
sample_submission.csvtemplate. - A final step swaps the first and last
$128 \times 128$ pixel blocks in the flattened output, which was a specific requirement for the competition's submission format.
The solution achieved a PSNR (Peak Signal-to-Noise Ratio) score of 24.3 on the evaluation metric.
The following Python libraries are required:
oscv2(OpenCV Python)numpypandaspathlibPIL(Pillow)matplotlibtorchtorchvisionscikit-learn(fortrain_test_split)torchmetrics(for SSIM calculation)
These can be installed via pip:
pip install opencv-python numpy pandas Pillow matplotlib torch torchvision scikit-learn torchmetrics-
Clone the repository (if applicable):
git clone <repository_url> cd <repository_name>
-
Place the dataset: Ensure your dataset is structured as described in the Dataset section and the paths in the code (
/kaggle/input/tst-day-1/train/train/real_imagesand/kaggle/input/tst-day-1/test/test/filtered_images) are updated to your local paths if running outside a Kaggle environment. -
Run the script:
python your_solution_script.py
This will train the model and generate
submission.csv(andsubmission_swapped.csv) in the current directory.