Xiaoming Li, Wangmeng Zuo, Chen Change Loy
S-Lab, Nanyang Technological University
- MARCONet is designed for regular character layout only. See details of MARCONet.
- MARCONet++ has more accurate alignment between character structural prior (green structure) and the degraded image.
- Release the inference code and model.
- Release the training code (no plans to release for now).
git clone https://github.com/csxmli2016/MARCONetPlusPlus
cd MARCONetPlusPlus
conda create -n mplus python=3.8 -y
conda activate mplus
pip install -r requirements.txt
Download the pre-trained models
python utils/download_github.py
and run for restoring text lines:
CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_TextLines -a -s
or run for restoring the whole text image:
CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_Whole -b -s -f 2
# Parameters:
-i: --input_path, default: ./Testsets/LR_TextLines or ./Testsets/LR_TextWhole
-o: --output_path, default: None will automatically make the saving dir with the format of '[LR path]_TIME_MARCONetPlus'
-a: --aligned, if the input is text lines, use -a; otherwise, the input is the whole text image and needs text line detection, do not use -a
-b: --bg_sr, when restoring the whole text images, use -b to restore the background region with BSRGAN. Without -b, background will keep the same as input
-f: --factor_scale, default: 2. When restoring the whole text images, use -f to define the scale factor of output
-s: --save_text, if you want to see the details of prior alignment, predicted characters, and locations, use -s
- We use BSRGAN to restore the background region.
- The parameters are tested on an NVIDIA A100 GPU (40G).
⚠️ If the inference speed is slow, this is caused by the large size of the input text image or the large factor_scale. You can resize it based on your needs.
Despite its high-fidelity performance, MARCONet++ still struggles in some real-world scenarios as it highly relies on:
- Real world character Recognition on complex degraded text images
- Real world character Detection on complex degraded text images
- Text line detection and segmentation
- Domain gap between our synthetic and real-world text images
🍒 Restoring complex character with high fidelity under such conditions has significant challenges. We have also explored various approaches, such as training OCR models with Transformers and using YOLO or Transformer-based methods for character detection, but these methods generally encounter the same issues. We encourage any potential collaborations to jointly tackle this challenge and advance robust, high-fidelity text restoration.
To quantitatively evaluate on real-world Chinese text line images, we curate a benchmark by filtering the RealCE test set to exclude images containing multiple text lines or inaccurate annotations, thereby constructing a Chinese text SR benchmark (see Section IV.B of our paper). You can download the RealCE-1K benchmark from here.
This project is built based on the excellent KAIR and RealCE.
This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.
@article{li2025marconetplus,
author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
title = {Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
year = {2025}
}
@InProceedings{li2023marconet,
author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
title = {Learning Generative Structure Prior for Blind Text Image Super-resolution},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2023}
}









