SRDC is a powerful, automated Python pipeline designed to convert pairs of Low-Resolution (LR) and High-Resolution (HR) videos into high-quality, aligned image datasets. These datasets are essential for training and evaluating Super-Resolution (SR), de-noising, and other image restoration models.
The pipeline intelligently handles complex real-world video challenges, such as mismatched aspect ratios (widescreen vs. fullscreen), color space differences (SDR/HDR), and temporal drift, to produce clean, content-matched, and spatially aligned image pairs.
(A Low Resolution Image vs. An Aligned High Resolution Image)

- High-Quality Frame Extraction: Uses FFmpeg with advanced filters to ensure maximum color fidelity, preventing common issues like blocky color edges (
chroma upsampling) and washed-out colors from HDR sources (tone mapping). - Widescreen / Pan-and-Scan Correction: A critical pre-processing step that automatically detects and crops widescreen footage to match the content of a corresponding fullscreen (pan-and-scan) version, dramatically improving alignment success.
- Versatile Border Cropping: Intelligently removes black and/or white letterbox/pillarbox bars from frames.
- Robust Temporal Matching: A multi-stage process that uses a fast, downscaled template match to find corresponding frames in time, with a sliding window to account for drift (e.g., PAL vs. NTSC speed differences).
- Sub-Pixel Image Alignment: Leverages the powerful
ImgAligntool to perform a final, precise spatial alignment of the matched pairs, correcting for minor shifts, rotation, and scaling. - Content-Aware Filtering:
- Low-Information Filter: Discards overly simple frames (e.g., solid black or white) based on variance.
- Perceptual Deduplication: Uses pHash to remove visually redundant or near-identical image pairs.
- Advanced Dataset Curation with CLIP: An optional final stage that uses the
SuperResImageSelectorto build a visually diverse and complex dataset, ideal for training robust SR models. It filters images based on complexity, brightness, and visual uniqueness (CLIP feature distance). - Resumable & Organized: The entire pipeline is broken into logical stages with progress saved to a
progress.jsonfile, allowing you to stop and resume processing. Outputs are neatly organized into folders for each stage. - Highly Configurable: A central
config.pyfile allows easy tuning of every aspect of the pipeline without touching the core logic.
The script executes a series of stages, with the output of one stage becoming the input for the next.
Input Videos (LR/HR)
│
▼
[ 1. Frame Extraction ] ──> (Deinterlacing, Chroma Upsampling, HDR Tone Mapping)
│
▼
[ 2. Pan & Scan Fix ] ──> (Crops widescreen to match fullscreen content)
│
▼
[ 3. Border Cropping ] ──> (Removes letterbox/pillarbox bars)
│
▼
[ 4. Frame Matching ] ──> (Finds LR/HR pairs corresponding in time)
│
▼
[ 5. Deduplication ] ──> (Removes visually similar pairs via pHash)
│
▼
[ 6. Alignment ] ──> (Spatially aligns the final pairs using ImgAlign)
│
▼
Final Paired Dataset
These must be installed first and accessible in your system's PATH.
- NVIDIA CUDA Toolkit: You must have the NVIDIA drivers and CUDA Toolkit installed to use a GPU. You can check your version with
nvcc --version. - FFmpeg: For all video processing. (Download)
- ImgAlign: For the final alignment step. (Download & Compile from GitHub)
It is highly recommended to use a dedicated Python virtual environment.
Step 1: Create and Activate Environment
python -m venv venv
# On Windows:
# venv\Scripts\activate
# On macOS/Linux:
# source venv/bin/activateStep 2: Install PyTorch with CUDA Support (CRITICAL STEP)
Do not install PyTorch using a generic command. You must install the version that matches your system's CUDA Toolkit.
- Go to the Official PyTorch Get Started page.
- Use the interactive tool to select your system configuration (e.g., Stable, Windows, Pip, Python, CUDA 12.1).
- Copy the generated command and run it in your activated virtual environment.
It will look something like this ( DO NOT copy this command directly, get the correct one from the website! ):
# Example command for CUDA 12.1 - verify on the PyTorch website!
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121This ensures that torch, ImgAlign, and the SISR filter can all leverage your GPU for maximum performance.
Step 3: Install Remaining Packages
Once PyTorch is installed correctly, install the rest of the required packages. Create a file named requirements.txt with the following content:
# requirements.txt
opencv-python
numpy
transformers
Pillow
imagehash
scipy
tqdm
psutil
Then, run the following command in your activated virtual environment:
pip install -r requirements.txtYour project folder should be set up like this:
.
├── srdc_pipeline.py # The main pipeline script.
├── config.py # All user-configurable settings.
├── README.md # This file.
├── LR/ # Your input Low-Resolution videos.
│ └── movie_v1.mp4
└── HR/ # Your input High-Resolution videos.
└── movie_v1.mkv # <-- Note: Extension can be different.
The pipeline will generate an Output folder (name configurable) with the following structure:
OUTPUT_BASE_FOLDER/
├── 1_EXTRACTED/ # Raw frames from videos.
├── 2_MATCHED/ # Temporally matched but unaligned pairs.
├── 3_ALIGNED/ # Spatially aligned, clean pairs.
│ ├── LR/
│ ├── HR/
│ └── Overlay/ # Visualizations of the alignment.
└── progress.json # Tracks pipeline state for resumption.
-
Clone the Repository:
git clone https://github.com/stinkybread/super_resolution_dataset_creator.git cd super_resolution_dataset_creator -
Install Dependencies: Follow the steps in the Requirements section above.
-
Prepare Videos: Place your LR and HR videos into their respective input folders (e.g.,
LR/andHR/). The script will match videos based on their filenames, ignoring the extension. -
Configure the Pipeline: Open
config.pyin a text editor. Adjust the paths, and enable/disable features to suit your needs. See the table below for key settings. -
Run the Pipeline:
python srdc_pipeline.py
The script will print its progress for each stage. If it's interrupted, you can typically run it again to resume from where it left off.
All settings are in config.py. Here are the most important ones to review:
| Parameter | Description | Recommendation |
|---|---|---|
LR_INPUT_VIDEO_FOLDER |
Path to your low-resolution videos. | Required. |
HR_INPUT_VIDEO_FOLDER |
Path to your high-resolution videos. | Required. |
OUTPUT_BASE_FOLDER |
Where all generated folders and files will be saved. | Required. |
ATTEMPT_PAN_AND_SCAN_FIX |
Crucial. Crops widescreen footage to match fullscreen content before alignment. | Set to True if your sources have different aspect ratios (e.g., 2.35:1 vs 4:3). |
ENABLE_CHROMA_UPSAMPLING |
Fixes blocky color artifacts present in most standard video encodings. | Keep True for almost all standard-definition (SDR) videos. |
ENABLE_HDR_TONE_MAPPING |
Fixes washed-out, grey colors when extracting frames from HDR (High Dynamic Range) video. | Set to True only if your HR source is HDR (e.g., a 4K Blu-ray). |
CROP_BLACK_BORDERS |
Enables automatic cropping of black bars (letterboxing). | Set to True if your videos have hardcoded black bars. |
CROP_WHITE_BORDERS |
Enables automatic cropping of white or very light-colored bars. | Set to True if your videos have hardcoded white bars. |
MATCH_THRESHOLD |
Similarity score (0.0 to 1.0) needed to consider two frames a match. | 0.65 is a good start. Lower it for difficult content, raise it for more accuracy. |
PHASH_SIMILARITY_THRESHOLD |
How different two frames must be to be kept. Lower is stricter. | 4 is a good default. Set to -1 to disable this filtering stage. |
ENABLE_SISR_FILTERING |
Enables the final dataset curation stage using CLIP to select a diverse, complex set of images. | Set to True to create a final, high-quality training set from the aligned pairs. |
This project is licensed under the MIT License.
- The Anthropic team for Claude.
- The Google AI team for AI Studio.
- The Enhance Everything Discord community for inspiration and discussion.
- The neosr project for concepts in super-resolution.