Skip to content

smu-ivpl/HiRT

Repository files navigation

HiRT: Hierarchical Recurrent Transformer Network for Video Super-Resolution (VSR)

Young-Ju Choi and Byung-Gyu Kim

Intelligent Vision Processing Lab. (IVPL), Sookmyung Women's University, Seoul, Republic of Korea


This repository is the official PyTorch implementation of the HiRT (Dissertation for the Degree of Doctor of Philosophy by Young-Ju Choi).

Official paper: Young-Ju Choi, Byung-Gyu Kim∗, HiRT: Hierarchical Recurrent Transformer Network for Video Super-Resolution (VSR), Engineering Applications of Artificial Intelligence (Elsevier), Volume 166, Part B: 113714, (https://doi.org/10.1016/j.engappai.2025.113714), 15 February 2026 (Ranked 2.5%, IF=8.0)


Summary of paper

Abstract

Video super-resolution (VSR) is a crucial technology for enhancing video frame quality, relying on effectively utilizing spatial correlation within frames and temporal dependencies between consecutive frames. Existing methods struggle to restore fine details for various motion types and lack true bi-directional access. Recent research predominantly focuses on residual block-based and transformer-based backbones demonstrating notable effectiveness in VSR. However, many methods treat spatial features uniformly, resulting in inadequate information acquisition and detail enhancement in feature extraction. This paper proposes the hierarchical recurrent transformer (HiRT) to enhance the recurrent propagation in the frequency domain. The hierarchical recurrent propagation in the proposed HiRT consists of the uni-directional backward and forward stages and bi-directional stage. This multi-stage-based structure can deal with various type of motion. The proposed HiRT comprises three transformer modules: the global transformer block, local transformer block, and image transformer block. The global transformer block improves low-frequency feature, which contains global background information of a frame. The high-frequency components are enhanced in the local transformer block. Alongside the image transformer, incorporating discrete wavelet transform (DWT)-based transformer processes can enhance both background and edge details. Experimental comparisons with state-of-the-art (SOTA) methods on benchmark datasets demonstrate the superiority of the proposed approach. The proposed HiRT outperforms all compared methods in terms of SSIM on REDS4 and Vid4 benchmarks. Especially, the proposed HiRT surpasses the VRT which is the transformer-based SOTA method with 0.32dB and 0.0068 on REDS4 in PSNR and SSIM, respectively. The proposed HiRT can brings about 0.12dB and 0.07dB higher PSNR performances compared to the BasicVSR++ on REDS4 and Vid4, respectively. Moreover, the proposed HiRT achieves about 0.0133 and 0.0067 of SSIM improvement on REDS4 compared to the Multi-Scale-T and LGDFNet-BPP as the latest VSR methods, respectively.

Network Architecture

Experimental Results


Getting Started

Dependencies and Installation

  • Anaconda3

  • Python == 3.8

    conda create --name hirt python=3.8
  • PyTorch (NVIDIA GPU + CUDA)

    Trained on PyTorch 1.9.1 and CUDA 11.1

    Run in ./

    pip install -r requirements.txt
    BASICSR_EXT=True python setup.py develop

Dataset Preparation

We used REDS and Vimeo90K datasets for training and Vid4 and REDS4 datasets for testing.

Model Zoo

Pre-trained models are available in below link.

google-drive

Please save the pre-trained models to './experiments/pretrained_models/HiRT/'.


Training

Run in ./

  • REDS

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash ./dist_train.sh 8 ./options/train_HiRT_REDS.yml
  • Vimeo90K

    CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 bash ./dist_train.sh 8 ./options/train_HiRT_Vimeo90K.yml

Testing

Run in ./

  • REDS4

    CUDA_VISIBLE_DEVICES=0 bash ./dist_test.sh 1 ./options/test_HiRT_REDS4.yml
  • Vid4

    CUDA_VISIBLE_DEVICES=0 bash ./dist_test.sh 1 ./options/test_HiRT_Vid4.yml

Acknowledgement

The codes are heavily based on BasicSR and PSRT. Thanks for their awesome works.

BasicSR :
@misc{basicsr,
  author =       {Xintao Wang and Liangbin Xie and Ke Yu and Kelvin C.K. Chan and Chen Change Loy and Chao Dong},
  title =        {{BasicSR}: Open Source Image and Video Restoration Toolbox},
  howpublished = {\url{https://github.com/XPixelGroup/BasicSR}},
  year =         {2022}
}
@article{shi2022rethinking,
  title={Rethinking Alignment in Video Super-Resolution Transformers},
  author={Shi, Shuwei and Gu, Jinjin and Xie, Liangbin and Wang, Xintao and Yang, Yujiu and Dong, Chao},
  journal={arXiv preprint arXiv:2207.08494},
  year={2022}
}

About

HiRT: Hierarchical Recurrent Transformer Network for Video Super-Resolution (VSR) - Official Repository (Dissertation for the Degree of Doctor of Philosophy by Young-Ju Choi)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages