Skip to content
forked from 1230young/bizgen

[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/

Notifications You must be signed in to change notification settings

psikosen/bizgen

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation (Glyph-ByT5-v3)

arXiv Project Page Model

This repository supports article-level visual text rendering of business content (infographics and slides) based on ultra-dense layouts

🌟 Features

  • Long context length: Supports ultra-dense layouts with 50+ layers and article-level descriptive prompts with more than 1000 tokens, and can generate high-quality business content with up to 2240*896 resolution.
  • Powerful visual text rendering: Supports article-level visual text rendering in ten different languages and maintains high spelling accuracy.
  • Image generation diversity and flexibility: Supports layer-wise detail refinement through layout conditional CFG.

🚧 TODO List

  • Release inference code and pretrained model
  • Release training code

Table of Contents

Environment Setup

1. Create Conda Environment

conda create -n bizgen python=3.10 -y
conda activate bizgen

2. Install Dependencies

git clone
cd bizgen
pip install -r requirements.txt

3. Login to Hugging Face

huggingface-cli login

Quick Start

Use inference.py to simply have a try:

python inference.py

Testing BizGen

1. Download Checkpoints

Create a path bizgen/checkpoints and download the following checkpoints into this path.

Name Description
byt5 ByT5 model checkpoint
lora_infographic Unet LoRA weights and finetuned ByT5 mapper checkpoint for infographic
lora_slides Unet LoRA weights and finetuned ByT5 mapper checkpoint for slides
spo Post-trained SDXL checkpoint (for aesthetic improvement)

The downloaded checkpoints should be organized as follows:

checkpoints/
├── byt5/
│   ├── base.pt
│   └── byt5_model.pt
├── lora/
|   ├── infographic/
|   |   ├──byt5_mapper.pt
|   |   └──unet_lora.pt
|   └── slides/
|       ├──byt5_mapper.pt
|       └──unet_lora.pt
└── spo

2. Run the testing Script

For infographics:

python inference.py \
--ckpt_dir checkpoints/lora/infographic \
--output_dir infographic \
--sample_list meta/infographics.json 

For slides:

python inference.py \
--ckpt_dir checkpoints/lora/slides \
--output_dir slide \
--sample_list meta/slides.json 

📬 Citation

If you find this code useful in your research, please consider citing:

@misc{peng2025bizgenadvancingarticlelevelvisual,
  title={BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation}, 
  author={Yuyang Peng and Shishi Xiao and Keming Wu and Qisheng Liao and Bohan Chen and Kevin Lin and Danqing Huang and Ji Li and Yuhui Yuan},
  year={2025},
  eprint={2503.20672},
  archivePrefix={arXiv},
  primaryClass={cs.CV},
  url={https://arxiv.org/abs/2503.20672}, 
}
@article{liu2024glyphv2,
  title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
  author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2406.10208},
  year={2024}
}
@article{liu2024glyph,
  title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
  author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
  journal={arXiv preprint arXiv:2403.09622},
  year={2024}
}

About

[CVPR 2025] This is an official inference code of the paper "BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation" . Project page: https://bizgen-msra.github.io/

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%