This repository supports article-level visual text rendering of business content (infographics and slides) based on ultra-dense layouts
- Long context length: Supports ultra-dense layouts with 50+ layers and article-level descriptive prompts with more than 1000 tokens, and can generate high-quality business content with up to 2240*896 resolution.
- Powerful visual text rendering: Supports article-level visual text rendering in ten different languages and maintains high spelling accuracy.
- Image generation diversity and flexibility: Supports layer-wise detail refinement through layout conditional CFG.
- Release inference code and pretrained model
- Release training code
conda create -n bizgen python=3.10 -y
conda activate bizgen
git clone
cd bizgen
pip install -r requirements.txt
huggingface-cli login
Use inference.py to simply have a try:
python inference.py
Create a path bizgen/checkpoints
and download the following checkpoints into this path.
Name | Description |
---|---|
byt5 |
ByT5 model checkpoint |
lora_infographic |
Unet LoRA weights and finetuned ByT5 mapper checkpoint for infographic |
lora_slides |
Unet LoRA weights and finetuned ByT5 mapper checkpoint for slides |
spo |
Post-trained SDXL checkpoint (for aesthetic improvement) |
The downloaded checkpoints should be organized as follows:
checkpoints/
├── byt5/
│ ├── base.pt
│ └── byt5_model.pt
├── lora/
| ├── infographic/
| | ├──byt5_mapper.pt
| | └──unet_lora.pt
| └── slides/
| ├──byt5_mapper.pt
| └──unet_lora.pt
└── spo
For infographics:
python inference.py \
--ckpt_dir checkpoints/lora/infographic \
--output_dir infographic \
--sample_list meta/infographics.json
For slides:
python inference.py \
--ckpt_dir checkpoints/lora/slides \
--output_dir slide \
--sample_list meta/slides.json
If you find this code useful in your research, please consider citing:
@misc{peng2025bizgenadvancingarticlelevelvisual,
title={BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation},
author={Yuyang Peng and Shishi Xiao and Keming Wu and Qisheng Liao and Bohan Chen and Kevin Lin and Danqing Huang and Ji Li and Yuhui Yuan},
year={2025},
eprint={2503.20672},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.20672},
}
@article{liu2024glyphv2,
title={Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering},
author={Liu, Zeyu and Liang, Weicong and Zhao, Yiming and Chen, Bohan and Li, Ji and Yuan, Yuhui},
journal={arXiv preprint arXiv:2406.10208},
year={2024}
}
@article{liu2024glyph,
title={Glyph-byt5: A customized text encoder for accurate visual text rendering},
author={Liu, Zeyu and Liang, Weicong and Liang, Zhanhao and Luo, Chong and Li, Ji and Huang, Gao and Yuan, Yuhui},
journal={arXiv preprint arXiv:2403.09622},
year={2024}
}