This repo contains quantization, inference configs, generation scripts, and evaluation utilities for the WAN2.1 Text-to-Video (T2V) 1.3B model, built around the LightX2V runtime.
The focus is comparing baseline (FP16/BF16) vs quantized DiT weights (e.g., INT8 / FP8) and evaluating both quality (VBench + traditional metrics) and efficiency (kernel-level profiling summaries).
configs/: JSON configs used bylightx2v.infer.wan_t2v_base.json: baseline config (no DiT quantization).wan_t2v_fp8_vllm.json: FP8 quantized DiT (vLLM backend).wan_t2v_sgl_fp8.json: FP8 quantized DiT (SGL backend).wan_t2v_int8_vllm.json: INT8 quantized DiT (vLLM backend).wan_t2v_int8_torchao.json: INT8 quantized DiT (TorchAO backend).
convert/: PowerShell scripts to convert/quantize model weights via LightX2V’s converter.fp8.ps1: convert DiT weights to FP8 (example usestorch.float8_e4m3fn).int8.ps1: convert DiT weights to INT8.
gen/: Python scripts to generate videos (typically using VBench prompts) and optionally profile efficiency.*_vbench.py: generate videos for different variants (base / fp8 / int8).efficiency_results/: exported profiling summaries (e.g.*_cuda_gpu_kern_sum.csv).VBench_full_info.json: a copy of the VBench prompt metadata (repo root also hasVBench_full_info.json).
evaluation_metrics/: quality metrics utilities.- Notebooks:
CLIP_Score.ipynb,similiarity_comparison.ipynb,fvd.ipynb. fvd.py: Frechet Video Distance implementation.traditional_metrics/: “traditional” frame-level metrics (MSE/PSNR/SSIM) over paired videos.
- Notebooks:
save_results/: VBench-related prompt lists (and optionally generated videos if you place them there).*_prompt/prompts.json: the prompts used for each run.
scripts/: VBench evaluation scripts that read videos + prompts and write JSON results.vbench_evaluation_results/: saved VBench evaluation outputs (*_full_info.json,*_eval_results.json).
- LightX2V installed and importable as a Python module (the scripts call
python -m lightx2v.inferandpython -m tools.convert.converter). - WAN2.1 T2V 1.3B weights and associated components (T5 encoder + VAE).
- (Optional) NVIDIA Nsight Systems (
nsys) if you enable kernel profiling ingen/*.py. - VBench Python package for
scripts/test_vbench_*.py. - We have provided two yml files for environment of torchao and sglang kernels and vllm kernels.
- Notice that these kernels require certain hardware. Please see https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html for details.
The configs reference the following default layout (edit paths if yours differ):
./models/wan2.1_t2v/diffusion_pytorch_model.safetensors(baseline DiT weights)./models/wan2.1_t2v/models_t5_umt5-xxl-enc-bf16.pth(T5 encoder)./models/wan2.1_t2v/Wan2.1_VAE.pth(VAE)./models/wan2.1_t2v/<quantized>.safetensors(converted DiT weights), e.g.wan2.1_480p_int8_lightx2v_test.safetensorswan2.1_480p_scaled_fp8_e4m3_test.safetensors
The scripts in convert/ show example conversion commands (Windows PowerShell):
pwsh -File .\convert\int8.ps1
pwsh -File .\convert\fp8.ps1Before running, update at least:
--source: your originaldiffusion_pytorch_model.safetensors--output: output folder for the converted weights--output_name: output filename prefix
All inference runs use python -m lightx2v.infer with:
--model_cls wan2.1--task t2v--model_path: DiT weights (baseline or quantized)--config_json: one of the configs inconfigs/
Example (baseline):
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path ./models/wan2.1_t2v/diffusion_pytorch_model.safetensors \
--config_json ./configs/wan_t2v_base.json \
--prompt "A corgi surfing on a wave, cinematic lighting" \
--negative_prompt "" \
--save_result_path ./out/base.mp4Example (INT8 vLLM):
python -m lightx2v.infer \
--model_cls wan2.1 \
--task t2v \
--model_path ./models/wan2.1_t2v/wan2.1_480p_int8_lightx2v_test.safetensors \
--config_json ./configs/wan_t2v_int8_vllm.json \
--prompt "A corgi surfing on a wave, cinematic lighting" \
--negative_prompt "" \
--save_result_path ./out/int8_vllm.mp4gen/*.py scripts load prompts from VBench_full_info.json, optionally filter by dimension, and then run lightx2v.infer repeatedly to save a batch of .mp4 videos under save_results/.
Example:
python gen/base_vbench.py
python gen/fp8_vllm_vbench.py
python gen/fp8_sgl_vbench.py
python gen/int8_vbench.py
python gen/int8_vbench_torchao.pyIf you want kernel profiling, set RECORD_KERNEL = True in the script and ensure nsys is installed.
The scripts in scripts/ compute VBench scores for a set of generated videos and write results to vbench_evaluation_results/.
python scripts/test_vbench_base.py
python scripts/test_vbench_vllm_fp8.py
python scripts/test_vbench_sgl_fp8.py
python scripts/test_vbench_vllm_int8.pyBy default:
gen/*.pywrites videos to folders likesave_results/base_vbench/,save_results/fp8_vllm_vbench/, etc.scripts/test_vbench_*.pyexpects videos under folders likesave_results/base_vbench_prompt/,save_results/vllm_fp8_prompt/,save_results/sgl_fp8_prompt/,save_results/vllm_int8_prompt/
To make evaluation work, choose one:
- Option A (recommended): generate into the
*_prompt/folders (edit thesave_dirvariable ingen/*.py). - Option B: change
videos_dirinscripts/test_vbench_*.pyto match where you generated the.mp4files.
Also make sure prompts.json aligns with the sorted list of *.mp4 in that folder (the evaluation scripts assume prompt_list[i] corresponds to video_files[i]).
Use evaluation_metrics/traditional_metrics/eval.py to compare a baseline directory vs a quantized directory. Filenames must end with _<number>.mp4 in both directories so videos can be paired by ID.
python evaluation_metrics/traditional_metrics/eval.py \
--base_dir ./save_results/base_vbench_prompt \
--quan_dir ./save_results/vllm_int8_prompt \
--out_csv ./metrics_int8_vs_base.csv \
--stride 1 \
--max_frames 0evaluation_metrics/fvd.py: Frechet Video Distance implementation (used inevaluation_metrics/fvd.ipynb).evaluation_metrics/CLIP_Score.ipynb: CLIPScore-based evaluation.evaluation_metrics/similiarity_comparison.ipynb: additional similarity comparisons.
gen/efficiency_results/ contains exported kernel-level summaries (e.g. *_cuda_gpu_kern_sum.csv) from profiling runs. The generation scripts can be configured to run under nsys profile to reproduce these artifacts.