Skip to content

alibaba/vstyle

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

Overview

VStyle is a bilingual (Chinese & English) benchmark for voice style adaptation. It covers four key tasks:

  • Acoustic attribute control
  • Natural language instruction following
  • Role-playing
  • Implicit empathy

To enable automated and reproducible evaluation, we introduce the LALM-as-a-Judge framework, which assesses model outputs across three dimensions:

  • Textual faithfulness (Is it saying the right thing?)
  • Style adherence (Does it match the intended style?)
  • Naturalness (Does it sound smooth and natural?)

VStyle goes beyond checking correctness — it evaluates how well the model speaks. Experiments on various open-source and commercial systems show its effectiveness in differentiating the voice style adaptation abilities of different models.

Leaderboard

  • Evaluation results of different SLMs across different task types.

Evaluate your model

We provide a Gemini API–based evaluation tool for assessing voice synthesis quality across multiple dimensions. It automatically processes audio samples, generates scores, and produces comprehensive analysis reports.

Quick Example:

# Install dependencies
pip install google-generativeai matplotlib pandas tqdm

# Run evaluation on example data
python lalm_eval/gemini_eval.py \
    --root_dir ./data/examples/model_res/en/wav \
    --metadata_path ./data/examples/model_res/en/metadata.jsonl \
    --out_dir ./data/examples/eval_res/en \
    --gemini_api_key YOUR_API_KEY

For detailed usage instructions, see: lalm_eval/README.md.

For inference results of other models reported in our paper, please refer to the dataset at https://huggingface.co/datasets/zhanjun/VStyle-responses.

Human-Model Correlation Analysis

We reproduce the correlation study between human annotations and LALM-as-a-Judge as reported in the paper. This validates the reliability of automated evaluation.

Quick Example:

# Download evaluation results of all seven models
huggingface-cli download --repo-type dataset --local-dir-use-symlinks False zhanjun/VStyle-eval-results --local-dir VStyle-eval-results

# Compute Spearman correlations
python human_align/compute_model_human_spearman_r.py

For detailed analysis instructions, see: human_align/README.md

Contributing

To submit your evaluation results to VStyle, please send the results file (metadata_with_score.jsonl) to jzhan24@m.fudan.edu.cn.

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages