UniSVG has been officially accepted to the ACM Multimedia 2025 Dataset Track 🎉
🌐 Project Page | 🏆 Conference Website
UniSVG is a comprehensive dataset designed for unified SVG generation (from textual prompts and images) and SVG understanding (color, category, usage, etc.). It comprises 525k data items tailored for Multi-modal Large Language Models (MLLM) training and evaluation. You can access the dataset on Hugging Face.
- 🔥 We are glad to announce that our UniSVG benchmark is used by Qwen3-VL!
- 🔥 Qwen2.5-VL-finetuned released! 🌐 Jaireyu/Qwen2.5-VL-UniSVG-finetuned
- 🔥 UniSVG got accepted by 🏆 ACM MM 2025 Dataset Track🎉 🌐 Project Page
- 🔥 UniSVG dataset images updated! 📂 Dataset 🌐 Project Page
- 🔥 UniSVG dataset opensourced! 📂 Dataset 🌐 Project Page
For more information, please visit the project homepage.
Unlike bitmap images, scalable vector graphics (SVG) maintain quality when scaled, frequently employed in computer vision and artistic design in the representation of SVG code. In this era of proliferating AI-powered systems, enabling AI to understand and generate SVG has become increasingly urgent. However, AI-driven SVG understanding and generation (U&G) remain significant challenges. SVG code, equivalent to a set of curves and lines controlled by floating-point parameters, demands high precision in SVG U&G. Besides, SVG generation operates under diverse conditional constraints, including textual prompts and visual references, which requires powerful multi-modal processing for condition-to-SVG transformation. Recently, the rapid growth of Multi-modal Large Language Models (MLLMs) have demonstrated capabilities to process multi-modal inputs and generate complex vector controlling parameters, suggesting the potential to address SVG U&G tasks within a unified model. To unlock MLLM's capabilities in the SVG area, we propose an SVG-centric dataset called UniSVG, comprising 525k data items, tailored for MLLM training and evaluation. To our best knowledge, it is the first comprehensive dataset designed for unified SVG generation (from textual prompts and images) and SVG understanding (color, category, usage, etc.).
To install the dataset, you can use the datasets library from Hugging Face:
pip install datasets
Here is an example of how to load and use the dataset:
from datasets import load_dataset
# Load the dataset
UniSVG_dataset = load_dataset("lili24/UniSVG")
# Print the first example
print(UniSVG_dataset[0])Please refer to prompts/data_construction_example.py for detailed information.
Please refer to prompts/Inference_prompts _examples.py for detailed information.
After downloading our UniSVG dataset, you can use your preferred models to finetune them on UniSVG/subset of UniSVG. We have tried to finetune on the following MLLMs, please feel free to get them: LLaVA 1.5, LLaVA-LLaMA, LLaVA-Next, GLM 4V, LLaMA 3.2, Qwen 2.5 VL.
Then please transfer your downloaded UniSVG dataset into LLaMA-Factory version: Modify and run the following two python files:
# Make sure you modify these files before using them!
python utils/transfer_to_llava.py
python utils/transfer_to_llama_factory.pyAs an example, we ultized LLaMA-Factory frame to do the finetuning. We saved one LLaMA-Factory repo here for your easy use
Then please added the modified LLaMA-Factory UniSVG json into "train/qwen25_llama32/LLaMA-Factory/data", and modify the "train/qwen25_llama32/LLaMA-Factory/data/dataset_info.json" by adding (our provided repo already includes this):
"unisvg": {
"file_name": "llama_UniSVG_train.json",
"formatting": "sharegpt",
"columns": {
"messages": "messages",
"images": "images"
},
"tags": {
"role_tag": "role",
"content_tag": "content",
"user_tag": "user",
"assistant_tag": "assistant"
}
}Congrats! Your UniSVG dataset is finally ready for finetuning! We offer you an example finetuning bash file using deepspeed under LLaMA Factory, please refer to: /train/qwen25_llama32/train.sh
After finetuning, you can edit the inference code for your model and run the inference by:
python infer.pyYou will get a inference json file with model answers in it, then please modify and use the evaluation.py to get the final score:
python evaluation.pyThis repo benefits from LLaMA-Factory and LLaVA, thanks for your great work!
If you use this dataset in your research, please cite the following paper:
@inproceedings{li2025unisvg,
title={UniSVG: A Unified Dataset for Vector Graphic Understanding and Generation with Multimodal Large Language Models},
author={Li, Jinke and Yu, Jiarui and Wei, Chenxing and Dong, Hande and Lin, Qiang and Yang, Liangjing and Wang, Zhicai and Hao, Yanbin},
booktitle={Proceedings of the 33rd ACM international conference on multimedia},
year={2025}
}