- [2025/12/31] 👋 We have released the training code of OmniSVG.
- [2025/12/22] We have updated MMSVG-Icon (264K→904K) and MMSVG-Illustration (66K→255K) datasets with enhanced captions and PNG previews! Check out MMSVG-Icon and MMSVG-Illustration.
- [2025/12/02] We have released the OmniSVG1.1_8B weights and updated OmniSVG1.1_4B model weights! Check out OmniSVG1.1_8B and OmniSVG1.1_4B.
- [2025/12/02] We have released MMSVGBench benchmark dataset and evaluation code! Check out MMSVGBench and Evaluation.
- [2025/09/18] OmniSVG is accepted to NeurIPS 2025🔥! See you in San Diego!
- [2025/07/22] 👋 We have released the Huggingface Demo. 🤗Demo.
- [2025/07/22] 👋 We have released the inference code and model weight of MMSVG-Icon and MMSVG-Illustration dataset. 🤗Weight.
- [2025/04/09] 👋 Release MMSVG-Icon and MMSVG-Illustration 🤗Dataset.
- [2025/04/09] 👋 Upload paper and init project. Read
If you are developing / using OmniSVG in your projects, or you want to contribute to OmniSVG, please let us know 🎉.
- If you find data issues when using MMSVG dataset, please drop an issue in this form.
- 👋 OmniSVG ComfyUI Plugin by @smthemex ComfyUI_OmniSVG.
- Project Page & Technical Report
- MMSVG-Icon and MMSVG-Illustration Dataset Release
- Inference Code & Model Weight of MMSVG-Icon and MMSVG-Illustration Dataset
- Online Demo (Gradio deployed on Huggingface)
- Model Weight of OmniSVG1.1_8B Release
- Model Weight of OmniSVG1.1_4B Release
- MMSVGBench Benchmark & Evaluation Code Release
- Training Code Release
OmniSVG is the first family of end-to-end multimodal SVG generators that leverage pre-trained Vision-Language Models (VLMs), capable of generating complex and detailed SVGs, from simple icons to intricate anime characters. We also introduce MMSVG-2M, a multimodal dataset with two million richly annotated SVG assets, along with a standardized evaluation protocol for conditional SVG generation tasks.
OmniSVG supports two model sizes with different base models:
| Model | Base Model | Base Vocab Size | Extended Vocab Size | Download | Size | Update |
|---|---|---|---|---|---|---|
| OmniSVG1.1_8B | Qwen2.5-VL-7B-Instruct | 152064 | 197000 | HuggingFace | 17.2 GB | 2025-12-02 |
| OmniSVG1.1_4B | Qwen2.5-VL-3B-Instruct | 151936 | 197000 | HuggingFace | 7.69 GB | 2025-12-02 |
git clone https://github.com/OpenVGLab/OmniSVG-train.git
cd OmniSVG-trainconda create -n omnisvg python=3.10
conda activate omnisvgmacOS:
brew install cairoLinux (Ubuntu/Debian):
sudo apt update
sudo apt install libcairo2 libcairo2-devInstall PyTorch with CUDA 12.1 support:
pip install torch==2.3.0+cu121 torchvision==0.18.0+cu121 --index-url https://download.pytorch.org/whl/cu121Install remaining dependencies:
pip install -r requirements.txtInstall picosvg for SVG preprocessing:
pip install picosvgFor faster training and inference, install Flash Attention 2:
pip install flash-attn --no-build-isolation| GPU Memory | Time per 256/512/1024/2048/4096 tokens | |
|---|---|---|
| OmniSVG1.1_8B | 26G | 5.38/9.02/20.11/40.34/98.11 seconds |
| OmniSVG1.1_4B | 17G | 4.08/8.68/18.07/37.51/82.70 seconds |
Note: The inference time shown here is measured per OmniSVG SVG tokens, while the inference time reported in our paper is measured per XML code tokens for fair comparison with baseline methods.
pip install huggingface-hub
# Download OmniSVG1.1-8B
huggingface-cli download OmniSVG/OmniSVG1.1_8B --local-dir /PATH/TO/OmniSVG1.1_8B
# Download OmniSVG1.1-4B
huggingface-cli download OmniSVG/OmniSVG1.1_4B --local-dir /PATH/TO/OmniSVG1.1_4B# Using 8B model (default)
python inference.py --task text-to-svg --input prompts.txt --output ./output_text --save-all-candidates
# Using 4B model
python inference.py --task text-to-svg --input prompts.txt --output ./output_text --model-size 4B --save-all-candidates
# Custom generation parameters
python inference.py --task text-to-svg --input prompts.txt --output ./output_text \
--temperature 0.5 --top-p 0.9 --top-k 50 --repetition-penalty 1.05python inference.py --task image-to-svg --input ./examples --output ./output_image --save-all-candidates# Local deployment
python app.pyOr try our Online Demo on Hugging Face Spaces.
data/
├── train_meta.csv # Training metadata
├── val_meta.csv # Validation metadata
├── svg/ # SVG files
│ ├── 000001.svg
│ ├── 000002.svg
│ └── ...
└── png/ # Rendered PNG images
├── 000001.png
├── 000002.png
└── ...
id,desc_en,detail,keywords,len_pix
000001,"A red apple","A detailed description of a red apple with stem and leaf","apple,fruit,red",256
000002,"Blue star","A simple five-pointed blue star","star,blue,shape",128# Download illustration dataset
huggingface-cli download OmniSVG/MMSVG-Illustration --repo-type dataset --local-dir ./data/illustration
# Download icon dataset
huggingface-cli download OmniSVG/MMSVG-Icon --repo-type dataset --local-dir ./data/iconOr use the built-in data downloader:
python -m utils.data_downloader --output_dir ./data --datasets illustration iconBefore training, SVG files need to be preprocessed to ensure compatibility with the model. We provide a preprocessing script that:
- Simplifies SVG syntax using
picosvg(removes unnecessary groups, transforms, rect elements, etc.) - Normalizes dimensions to 200×200 pixels
- Optionally simplifies paths for more efficient tokenization
# Basic preprocessing
python preprocess_svg.py --input file.svg --output processed.svg
# With custom dimensions
python preprocess_svg.py --input file.svg --output processed.svg --width 200 --height 200
# Process all SVGs in a directory
python preprocess_svg.py --input_dir ./raw_svgs --output_dir ./processed_svgs
| Option | Default | Description |
|---|---|---|
--input / -i |
- | Single SVG file to process |
--output / -o |
- | Output path for single file (auto-generated if not specified) |
--input_dir |
- | Directory containing SVG files for batch processing |
--output_dir |
- | Output directory for batch processing |
--scale |
1.0 | SVG zoom scale factor |
--width |
200 | Output SVG width in pixels |
--height |
200 | Output SVG height in pixels |
--simplify |
False | Enable path simplification (arcs, heuristics, splitting) |
--max_dist |
5 | Maximum path length before splitting (used with --simplify) |
-
picosvg preprocessing: Converts complex SVG features to simple paths
- Removes
<g>groups and flattens structure - Converts
<rect>,<circle>,<ellipse>to<path>elements - Removes transforms by baking them into coordinates
- Strips unsupported attributes and elements
- Removes
-
Normalization: Scales and centers the SVG to fit within the target dimensions (200×200 by default)
-
Path simplification (optional):
- Simplifies arc commands
- Applies heuristic simplification
- Splits long paths for better tokenization
The training system uses YAML configuration files located in the configs/ directory:
configs/tokenization.yaml- Model-specific tokenization settingsconfigs/train_config.yaml- Training hyperparameters
# configs/train_config.yaml
model:
size: "4B" # Model size: "4B" or "8B"
use_flash_attn: true # Enable Flash Attention 2
data:
data_dir: "./data" # Data directory path
max_seq_length: 2048 # Maximum SVG sequence length, decrease if cuda out pf memory
training:
learning_rate: 1.0e-5
epochs: 100
gradient_accumulation_steps: 4Edit run.sh to configure your settings:
# Configuration in run.sh
MODEL_SIZE="4B" # "4B" or "8B"
USE_FLASH_ATTN="true" # Enable Flash Attention
NUM_GPUS=8 # Number of GPUs
BATCH_SIZE=4 # Batch size per GPU
DATA_DIR="./data" # Data directory
# Run training
bash run.sh# Train 4B model
accelerate launch --num_processes 8 --mixed_precision bf16 \
train.py \
--model_size 4B \
--use_flash_attn \
--data_dir ./data \
--output_dir ./output \
--batch_size 4
# Train 8B model
accelerate launch --num_processes 8 --mixed_precision bf16 \
train.py \
--model_size 8B \
--use_flash_attn \
--data_dir ./data \
--output_dir ./output \
--batch_size 2accelerate launch train.py \
--model_size 4B \
--use_flash_attn \
--use_hf_data \
--datasets illustration icon \
--data_dir ./data# Resume from official OmniSVG checkpoint (auto-download)
accelerate launch train.py \
--model_size 4B \
--resume_from_checkpoint auto \
--data_dir ./data
# Resume from local checkpoint
accelerate launch train.py \
--model_size 4B \
--resume_from_checkpoint /path/to/checkpoint \
--data_dir ./data# Single GPU training (for debugging)
python train.py --model_size 4B --use_flash_attn --data_dir ./data --batch_size 1
# Multi-GPU training with DeepSpeed
accelerate launch --config_file ./configs/zero_stage2.yaml \
train.py --model_size 8B --use_flash_attn --data_dir ./data
# List available models and datasets
python train.py --list_models
python train.py --list_datasetsCheckpoints and logs are saved to the output directory:
output/omnisvg_4b_YYYYMMDD_HHMMSS/
├── config.yaml # Saved configuration
├── args.json # Command line arguments
├── logs/ # TensorBoard logs
├── step_3000/ # Checkpoint at step 3000
├── step_6000/ # Checkpoint at step 6000
└── best_model/ # Best validation checkpoint
Monitor training with TensorBoard:
tensorboard --logdir ./output/omnisvg_4b/logsWe provide MMSVGBench for standardized evaluation of SVG generation models.
Download MMSVGBench:
huggingface-cli download OmniSVG/MMSVGBench --repo-type dataset --local-dir /PATH/TO/MMSVGBenchMMSVGBench is a purely synthetic benchmark where all prompts and images are generated using GPT models, ensuring the data is unseen during model training for fair generalization evaluation.
| Task | Complexity Level | Samples | Description |
|---|---|---|---|
| Text-to-SVG | Icon | 150 | Simple icons (1-2 elements) |
| Text-to-SVG | Illustration | 150 | Complex illustrations (1-3 interacting elements) |
| Image-to-SVG | Icon | 150 | GPT-4o generated icon images |
| Image-to-SVG | Illustration | 150 | GPT-4o generated illustration images |
The evaluation code is available in the metrics directory. For more details, see MMSVGBench.
OmniSVG/
├── configs/
│ ├── tokenization.yaml # Tokenization config for 4B/8B models
│ └── train_config.yaml # Training hyperparameters
├── utils/
│ ├── __init__.py
│ ├── config.py # Configuration management
│ ├── dataset.py # Dataset and data loading
│ └── data_downloader.py # HuggingFace data downloading
├── model/
│ └── decoder.py # Model architecture
├── metrics/ # Evaluation metrics
├── train.py # Training script
├── inference.py # Inference script
├── preprocess_svg.py # SVG data preprocessing script
├── app.py # Gradio demo
├── run.sh # Training launch script
└── requirements.txt
OmniSVG is licensed under the Apache License 2.0, while MMSVG dataset is under Creative Commons Attribution Non Commercial Share Alike 4.0 License.
@article{yang2025omnisvg,
title={OmniSVG: A Unified Scalable Vector Graphics Generation Model},
author={Yiying Yang and Wei Cheng and Sijin Chen and Xianfang Zeng and Jiaxu Zhang and Liao Wang and Gang Yu and Xinjun Ma and Yu-Gang Jiang},
journal={arXiv preprint arxiv:2504.06263},
year={2025}
}We thank the following excellent open-source works:
- IconShop: The first advanced work that leverages LLMs to generate monochrome, icon-level SVGs.
- LLM4SVG: Treats SVG coordinates as number strings for higher spatial accuracy.
- StarVector: Equips LLM with an image encoder for Image-to-SVG generation.