GitHub - csxmli2016/MARCONetPlusPlus: Enhanced Generative Structure Prior for Text Image Super-Resolution [TPAMI]

Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution

Xiaoming Li, Wangmeng Zuo, Chen Change Loy

S-Lab, Nanyang Technological University

👷 The whole framework:

👷 Character Structure Prior Pretraining:

🔔 MARCONet 🆚 MARCONet++

MARCONet is designed for regular character layout only. See details of MARCONet.

MARCONet++ has more accurate alignment between character structural prior (green structure) and the degraded image.

📋 TODO

Release the inference code and model.
Release the training code (no plans to release for now).

🚶 Getting Started

git clone https://github.com/csxmli2016/MARCONetPlusPlus
cd MARCONetPlusPlus
conda create -n mplus python=3.8 -y
conda activate mplus
pip install -r requirements.txt

🚶 Inference

Download the pre-trained models

python utils/download_github.py

and run for restoring text lines:

CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_TextLines -a -s

or run for restoring the whole text image:

CUDA_VISIBLE_DEVICES=0 python test_marconetplus.py -i ./Testsets/LR_Whole -b -s -f 2

# Parameters:
-i: --input_path, default: ./Testsets/LR_TextLines or ./Testsets/LR_TextWhole
-o: --output_path, default: None will automatically make the saving dir with the format of '[LR path]_TIME_MARCONetPlus'
-a: --aligned, if the input is text lines, use -a; otherwise, the input is the whole text image and needs text line detection, do not use -a
-b: --bg_sr, when restoring the whole text images, use -b to restore the background region with BSRGAN. Without -b, background will keep the same as input
-f: --factor_scale, default: 2. When restoring the whole text images, use -f to define the scale factor of output
-s: --save_text, if you want to see the details of prior alignment, predicted characters, and locations, use -s

🏃 Restoring Real-world Chinese Text Images

We use BSRGAN to restore the background region.

The parameters are tested on an NVIDIA A100 GPU (40G).

⚠️ If the inference speed is slow, this is caused by the large size of the input text image or the large factor_scale. You can resize it based on your needs.

🏃 Restoring detected text line

🏃 Style w interpolation from three characters with different styles

‼️ Failure Case

Despite its high-fidelity performance, MARCONet++ still struggles in some real-world scenarios as it highly relies on:

Real world character Recognition on complex degraded text images
Real world character Detection on complex degraded text images
Text line detection and segmentation
Domain gap between our synthetic and real-world text images

🍒 Restoring complex character with high fidelity under such conditions has significant challenges. We have also explored various approaches, such as training OCR models with Transformers and using YOLO or Transformer-based methods for character detection, but these methods generally encounter the same issues. We encourage any potential collaborations to jointly tackle this challenge and advance robust, high-fidelity text restoration.

📎 RealCE-1K benchmark

To quantitatively evaluate on real-world Chinese text line images, we curate a benchmark by filtering the RealCE test set to exclude images containing multiple text lines or inaccurate annotations, thereby constructing a Chinese text SR benchmark (see Section IV.B of our paper). You can download the RealCE-1K benchmark from here.

🍺 Acknowledgement

This project is built based on the excellent KAIR and RealCE.

©️ License

This project is licensed under NTU S-Lab License 1.0. Redistribution and use should follow this license.

🍻 Citation

@article{li2025marconetplus,
  author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
  title = {Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution},
  journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year = {2025}
}

@InProceedings{li2023marconet,
  author = {Li, Xiaoming and Zuo, Wangmeng and Loy, Chen Change},
  title = {Learning Generative Structure Prior for Blind Text Image Super-resolution},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year = {2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Imgs		Imgs
Testsets		Testsets
models		models
networks		networks
op		op
utils		utils
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
test_marconetplus.py		test_marconetplus.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution

👷 The whole framework:

👷 Character Structure Prior Pretraining:

🔔 MARCONet 🆚 MARCONet++

📋 TODO

🚶 Getting Started

🚶 Inference

🏃 Restoring Real-world Chinese Text Images

🏃 Restoring detected text line

🏃 Style w interpolation from three characters with different styles

‼️ Failure Case

📎 RealCE-1K benchmark

🍺 Acknowledgement

©️ License

🍻 Citation

About

Uh oh!

Releases 1

Packages

Languages

License

csxmli2016/MARCONetPlusPlus

Folders and files

Latest commit

History

Repository files navigation

Enhanced Generative Structure Prior for Chinese Text Image Super-Resolution

👷 The whole framework:

👷 Character Structure Prior Pretraining:

🔔 MARCONet 🆚 MARCONet++

📋 TODO

🚶 Getting Started

🚶 Inference

🏃 Restoring Real-world Chinese Text Images

🏃 Restoring detected text line

🏃 Style w interpolation from three characters with different styles

‼️ Failure Case

📎 RealCE-1K benchmark

🍺 Acknowledgement

©️ License

🍻 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages