GitHub - darknoon/prefix-rl

WIP: Train vision-language models with reinforcement learning. Using modal for compute, may make it platform-agnostic in the future

References:

https://github.com/willccbb/verifiers
https://huggingface.co/papers/2505.20793
- LLaMA-Factory for SFT
- EasyR1 for RLRF

Implementation of SVG RLRF paper (Im2SVG):

Further ideas Im2SVG:

Prompt model to start with the gt width/height/viewbox ie "generate an 100x100 SVG that …"
Test out more featureful renderers
- Skia?
- headless chromium
Test out data generation of easy examples instead of fine-tuning for bootstrapping
Datasets
- https://huggingface.co/OmniSVG

Usage:

SFT

modal run --detach run_sft_modal.py

RL fine-tuning

modal run --detach run_easyr1_modal.py::train_model_easyr1 --config svg

Evaluation

Run evaluations against different models:

OpenAI Models

python svg_eval.py --client openai --model_name gpt-4o-mini --dataset simple-shapes -n 10
python svg_eval.py --client openai --model_name gpt-4o --dataset svg-stack -n 100 --temperature 0.1
python svg_eval.py --client openai-responses --model_name o1-mini --dataset simple-shapes -n 10

vLLM Models (Qwen2.5-VL)

# Run the vLLM server
MODEL_NAME=Qwen/Qwen2.5-VL-3B-Instruct modal serve run_vllm_server_modal.py

The script will output the URL of the vLLM server.

# Run evaluation
python svg_eval.py --client vllm --vllm_endpoint https://prefix--prefix-rl-vllm-server-serve-dev.modal.run/v1/ --model_name "Qwen/Qwen2.5-VL-7B-Instruct" --num_eval_examples 100 --debug_dump --num_workers 16

Google Gemini

python svg_eval.py --client google --model_name gemini-2.0-flash --dataset simple-shapes -n 10
python svg_eval.py --client google --model_name gemini-2.5-flash --dataset svg-stack -n 100 --temperature 0.2

Anthropic Claude

python svg_eval.py --client anthropic --model_name claude-3-5-sonnet-20241022 --dataset simple-shapes -n 10

Debug with VS Code

Use the debug configurations in .vscode/launch.json for step-through debugging.

Key Parameters

--dataset: simple-shapes (small) or svg-stack (large)
--temperature: 0.1 (default) for consistent output, higher for creativity
--num_workers: Parallel processing (default: 1)
--debug_dump: Generate detailed HTML reports with image comparisons

Name		Name	Last commit message	Last commit date
Latest commit History 72 Commits
.claude		.claude
.vscode		.vscode
dump		dump
env/svg		env/svg
human/simple-shapes		human/simple-shapes
nb		nb
prefixrl		prefixrl
reports		reports
tests/reward/image		tests/reward/image
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
deepspeed_zero3.yaml		deepspeed_zero3.yaml
image.png		image.png
image_cairo.png		image_cairo.png
image_ref.png		image_ref.png
pyproject.toml		pyproject.toml
run_rlft_easyr1_modal.py		run_rlft_easyr1_modal.py
run_rlft_trl_modal_old.py		run_rlft_trl_modal_old.py
run_rlft_verl_modal.py		run_rlft_verl_modal.py
run_sft_llama_factory_modal.py		run_sft_llama_factory_modal.py
run_sft_trl_modal.py		run_sft_trl_modal.py
run_vllm_server_modal.py		run_vllm_server_modal.py
sft_trl.py		sft_trl.py
svg_eval.py		svg_eval.py
svg_renderer_browser.py		svg_renderer_browser.py
svg_renderer_cairo.py		svg_renderer_cairo.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Usage:

SFT

RL fine-tuning

Evaluation

OpenAI Models

vLLM Models (Qwen2.5-VL)

Google Gemini

Anthropic Claude

Debug with VS Code

Key Parameters

About

Uh oh!

Releases

Packages

Uh oh!

Languages

darknoon/prefix-rl

Folders and files

Latest commit

History

Repository files navigation

Usage:

SFT

RL fine-tuning

Evaluation

OpenAI Models

vLLM Models (Qwen2.5-VL)

Google Gemini

Anthropic Claude

Debug with VS Code

Key Parameters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages