Skip to content

azuresky03/quantization_wanx

Repository files navigation

WAN2.1 T2V 1.3B Quantization (LightX2V)

This repo contains quantization, inference configs, generation scripts, and evaluation utilities for the WAN2.1 Text-to-Video (T2V) 1.3B model, built around the LightX2V runtime.

The focus is comparing baseline (FP16/BF16) vs quantized DiT weights (e.g., INT8 / FP8) and evaluating both quality (VBench + traditional metrics) and efficiency (kernel-level profiling summaries).

Folder overview

  • configs/: JSON configs used by lightx2v.infer.
    • wan_t2v_base.json: baseline config (no DiT quantization).
    • wan_t2v_fp8_vllm.json: FP8 quantized DiT (vLLM backend).
    • wan_t2v_sgl_fp8.json: FP8 quantized DiT (SGL backend).
    • wan_t2v_int8_vllm.json: INT8 quantized DiT (vLLM backend).
    • wan_t2v_int8_torchao.json: INT8 quantized DiT (TorchAO backend).
  • convert/: PowerShell scripts to convert/quantize model weights via LightX2V’s converter.
    • fp8.ps1: convert DiT weights to FP8 (example uses torch.float8_e4m3fn).
    • int8.ps1: convert DiT weights to INT8.
  • gen/: Python scripts to generate videos (typically using VBench prompts) and optionally profile efficiency.
    • *_vbench.py: generate videos for different variants (base / fp8 / int8).
    • efficiency_results/: exported profiling summaries (e.g. *_cuda_gpu_kern_sum.csv).
    • VBench_full_info.json: a copy of the VBench prompt metadata (repo root also has VBench_full_info.json).
  • evaluation_metrics/: quality metrics utilities.
    • Notebooks: CLIP_Score.ipynb, similiarity_comparison.ipynb, fvd.ipynb.
    • fvd.py: Frechet Video Distance implementation.
    • traditional_metrics/: “traditional” frame-level metrics (MSE/PSNR/SSIM) over paired videos.
  • save_results/: VBench-related prompt lists (and optionally generated videos if you place them there).
    • *_prompt/prompts.json: the prompts used for each run.
  • scripts/: VBench evaluation scripts that read videos + prompts and write JSON results.
  • vbench_evaluation_results/: saved VBench evaluation outputs (*_full_info.json, *_eval_results.json).

Prerequisites (expected environment)

  • LightX2V installed and importable as a Python module (the scripts call python -m lightx2v.infer and python -m tools.convert.converter).
  • WAN2.1 T2V 1.3B weights and associated components (T5 encoder + VAE).
  • (Optional) NVIDIA Nsight Systems (nsys) if you enable kernel profiling in gen/*.py.
  • VBench Python package for scripts/test_vbench_*.py.
  • We have provided two yml files for environment of torchao and sglang kernels and vllm kernels.

Model files (expected paths)

The configs reference the following default layout (edit paths if yours differ):

  • ./models/wan2.1_t2v/diffusion_pytorch_model.safetensors (baseline DiT weights)
  • ./models/wan2.1_t2v/models_t5_umt5-xxl-enc-bf16.pth (T5 encoder)
  • ./models/wan2.1_t2v/Wan2.1_VAE.pth (VAE)
  • ./models/wan2.1_t2v/<quantized>.safetensors (converted DiT weights), e.g.
    • wan2.1_480p_int8_lightx2v_test.safetensors
    • wan2.1_480p_scaled_fp8_e4m3_test.safetensors

Convert / quantize DiT weights

The scripts in convert/ show example conversion commands (Windows PowerShell):

pwsh -File .\convert\int8.ps1
pwsh -File .\convert\fp8.ps1

Before running, update at least:

  • --source: your original diffusion_pytorch_model.safetensors
  • --output: output folder for the converted weights
  • --output_name: output filename prefix

Run inference with LightX2V

All inference runs use python -m lightx2v.infer with:

  • --model_cls wan2.1
  • --task t2v
  • --model_path: DiT weights (baseline or quantized)
  • --config_json: one of the configs in configs/

Example (baseline):

python -m lightx2v.infer \
  --model_cls wan2.1 \
  --task t2v \
  --model_path ./models/wan2.1_t2v/diffusion_pytorch_model.safetensors \
  --config_json ./configs/wan_t2v_base.json \
  --prompt "A corgi surfing on a wave, cinematic lighting" \
  --negative_prompt "" \
  --save_result_path ./out/base.mp4

Example (INT8 vLLM):

python -m lightx2v.infer \
  --model_cls wan2.1 \
  --task t2v \
  --model_path ./models/wan2.1_t2v/wan2.1_480p_int8_lightx2v_test.safetensors \
  --config_json ./configs/wan_t2v_int8_vllm.json \
  --prompt "A corgi surfing on a wave, cinematic lighting" \
  --negative_prompt "" \
  --save_result_path ./out/int8_vllm.mp4

Generate videos (VBench prompts)

gen/*.py scripts load prompts from VBench_full_info.json, optionally filter by dimension, and then run lightx2v.infer repeatedly to save a batch of .mp4 videos under save_results/.

Example:

python gen/base_vbench.py
python gen/fp8_vllm_vbench.py
python gen/fp8_sgl_vbench.py
python gen/int8_vbench.py
python gen/int8_vbench_torchao.py

If you want kernel profiling, set RECORD_KERNEL = True in the script and ensure nsys is installed.

Evaluate with VBench

The scripts in scripts/ compute VBench scores for a set of generated videos and write results to vbench_evaluation_results/.

python scripts/test_vbench_base.py
python scripts/test_vbench_vllm_fp8.py
python scripts/test_vbench_sgl_fp8.py
python scripts/test_vbench_vllm_int8.py

Important: generation folder naming

By default:

  • gen/*.py writes videos to folders like save_results/base_vbench/, save_results/fp8_vllm_vbench/, etc.
  • scripts/test_vbench_*.py expects videos under folders like save_results/base_vbench_prompt/, save_results/vllm_fp8_prompt/, save_results/sgl_fp8_prompt/, save_results/vllm_int8_prompt/

To make evaluation work, choose one:

  • Option A (recommended): generate into the *_prompt/ folders (edit the save_dir variable in gen/*.py).
  • Option B: change videos_dir in scripts/test_vbench_*.py to match where you generated the .mp4 files.

Also make sure prompts.json aligns with the sorted list of *.mp4 in that folder (the evaluation scripts assume prompt_list[i] corresponds to video_files[i]).

Traditional metrics (MSE / PSNR / SSIM)

Use evaluation_metrics/traditional_metrics/eval.py to compare a baseline directory vs a quantized directory. Filenames must end with _<number>.mp4 in both directories so videos can be paired by ID.

python evaluation_metrics/traditional_metrics/eval.py \
  --base_dir ./save_results/base_vbench_prompt \
  --quan_dir ./save_results/vllm_int8_prompt \
  --out_csv ./metrics_int8_vs_base.csv \
  --stride 1 \
  --max_frames 0

FVD and other metrics

  • evaluation_metrics/fvd.py: Frechet Video Distance implementation (used in evaluation_metrics/fvd.ipynb).
  • evaluation_metrics/CLIP_Score.ipynb: CLIPScore-based evaluation.
  • evaluation_metrics/similiarity_comparison.ipynb: additional similarity comparisons.

Efficiency results

gen/efficiency_results/ contains exported kernel-level summaries (e.g. *_cuda_gpu_kern_sum.csv) from profiling runs. The generation scripts can be configured to run under nsys profile to reproduce these artifacts.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •