THEval

Official implementation of THEval: Evaluation Framework for Talking Head Video Generation.

THEval evaluates talking-head videos with eight metrics grouped into three dimensions: quality, naturalness, and synchronization. Each metric is normalized by its closeness to the ground-truth value of the evaluation set, and the final score is the unweighted average of the eight normalized metrics. Higher is better.

Metrics

THEval contains the following metrics:

Dimension	Metrics
Quality	Global Aesthetics, Mouth Quality, Face Quality
Naturalness	Lip Dynamics, Head Motion Dynamics, Eyebrow Dynamics
Synchronization	Silent Lip Stability, Lip-Sync

The final score is computed as:

score_m = 1 - abs(method_m - GT_m) / abs(GT_m)
final_score = mean(score_m for all eight metrics)

Leaderboard

Results from the paper on the THEval evaluation dataset. All metric values are normalized scores, and higher is better.

Rank	Model	Type	Global Aesthetics	Mouth Quality	Face Quality	Lip Dynamics	Head Motion	Eyebrow Dynamics	Silent Lip Stability	Lip-Sync	Final Score
1	LivePortrait	Video-driven	0.9464	0.9760	0.8784	0.9913	0.7548	0.9997	0.9316	0.9980	0.9345
2	X-Portrait	Video-driven	0.9502	0.9990	0.9568	0.9611	0.6091	0.7897	0.9924	0.9407	0.8999
3	LIA-X	Video-driven	0.9466	0.9195	0.8705	0.9030	0.6233	0.9090	0.9087	0.9644	0.8806
4	Hallo2	Audio-driven	0.9619	0.9254	0.9017	0.9883	0.2395	0.8530	0.9620	0.9502	0.8477
5	Echomimic	Audio-driven	0.8499	0.9617	0.9514	0.7930	0.3806	0.8071	0.8251	0.9964	0.8207
6	EmoPortrait	Video-driven	0.9542	0.8799	0.7957	0.9159	0.5136	0.5840	0.9354	0.9608	0.8174
7	OmniAvatar	Audio-driven	0.9767	0.9919	0.9521	0.4650	0.6039	0.8488	0.6160	0.9972	0.8064
8	FLOAT	Audio-driven	0.8713	0.9868	0.9645	0.4266	0.5115	0.8945	0.6958	0.9992	0.7938
9	ControlTalk	Video-driven	0.7759	0.8360	0.7584	0.5476	0.5058	0.9785	0.9163	0.9897	0.7885
10	Dimitra	Audio-driven	0.9523	0.8798	0.7914	0.7863	0.1279	0.6372	0.8555	0.9430	0.7467
11	SadTalker	Audio-driven	0.9576	0.9142	0.6005	0.8276	0.2867	0.6084	0.6806	0.9794	0.7319
12	MCNet	Video-driven	0.7499	0.7655	0.4771	0.8925	0.2297	0.9132	0.8669	0.9541	0.7311
13	DaGAN	Video-driven	0.7547	0.7646	0.5105	0.8262	0.3029	0.8362	0.7452	0.9719	0.7140
14	FOM	Video-driven	0.7516	0.7566	0.4875	0.6743	0.3269	0.8613	0.5970	0.9929	0.6810
15	LIA	Video-driven	0.7265	0.7622	0.4899	0.6912	0.3080	0.8920	0.5741	0.9913	0.6794
16	Real3DPortrait	Audio-driven	0.8597	0.8732	0.7934	0.7348	0.0895	0.3170	0.7072	0.9695	0.6680
17	Wav2Lip	Audio-driven	0.9090	0.9180	0.6762	0.6966	0.1124	0.3662	0.6388	0.8849	0.6502

Installation

THEval needs Python and ffmpeg. We recommend installing it with conda:

git clone https://github.com/Newbyl/THEval.git
cd THEval
conda create -n theval python=3.10 -y
conda activate theval

Then install the environment with:

bash install.sh
pip install -e .

Download the external model code and checkpoints:

python tools/download_external_models.py --all

This installs the model files expected by the metric scripts:

models/facexformer/ckpts/model.pt                  # Head Motion Dynamics

Prepare Videos

Create a text file with one video path per line:

theval-list-videos /path/to/my_method_videos -o input_files/my_method.txt

Run The Metrics

Run each metric script from the repository root. Every script reads the same video list and writes one output file.

Quality

python Video_Quality/global_aesthetic.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/global_aesthetic.csv
python Video_Quality/mouth_quality.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/mouth_quality.csv
python Video_Quality/face_quality.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/face_quality.csv

Naturalness

python Naturalness/lip_dynamics.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/lip_dynamics.csv
python Naturalness/head_motion_dynamics.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/head_motion_dynamics.csv
python Naturalness/eyebrow_dynamics.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/eyebrow_dynamics.csv

Synchronization

python Synchronization/silent_lip_stability.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/silent_lip_stability.csv
python Synchronization/lip_sync.py --video_txt input_files/my_method.txt --output_txt output_files/my_method/lip_sync.csv

For lip_sync.py and silent_lip_stability.py, you can pass --audio_folder /path/to/wavs if the audio has already been extracted.

Extract reusable WAV files from the same video list with:

python tools/extract_audio.py --video_txt input_files/my_method.txt --output_dir output_files/my_method/audio

Compute A Final Score

The final score compares a method's raw metrics to the ground-truth metrics. For this reason, the input to theval-score must contain both rows:

GT,...
MyMethod,...

The repository ships the GT row for the THEval evaluation set in examples/gt_metrics.csv. Copy it once to start your raw metrics table, then append your method's row:

cp examples/gt_metrics.csv output_files/raw_metrics.csv

theval-collect-metrics \
  --model MyMethod \
  --output output_files/raw_metrics.csv \
  --append \
  --global-aesthetic output_files/my_method/global_aesthetic.csv \
  --mouth-quality output_files/my_method/mouth_quality.csv \
  --face-quality output_files/my_method/face_quality.csv \
  --lip-dynamics output_files/my_method/lip_dynamics.csv \
  --head-motion-dynamics output_files/my_method/head_motion_dynamics.csv \
  --eyebrow-dynamics output_files/my_method/eyebrow_dynamics.csv \
  --silent-lip-stability output_files/my_method/silent_lip_stability.csv \
  --lip-sync output_files/my_method/lip_sync.csv

theval-score --metrics output_files/raw_metrics.csv --output output_files/theval_scores.csv

The input CSV for theval-score should look like this:

Model,Global aesthetic,Mouth quality,Face quality,Lip dynamics,Head motion dynamics,Eyebrow dynamics,Silent lip stability,Lip sync
GT,...
MyMethod,...

If you evaluate on a different dataset, recompute the eight GT metrics for that dataset and use those GT values instead of examples/gt_metrics.csv.

Citation

@inproceedings{quignon2026theval,
  title={THEval: A Comprehensive Framework for Evaluating Talking Head Generation},
  author={Quignon, Nabyl and Chopin, Baptiste and Wang, Yaohui and Dantcheva, Antitza},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Findings},
  year={2026}
}

Acknowledgements

THEval builds on public models and libraries including pyiqa, MediaPipe, Silero VAD, and FaceXFormer. Please follow their licenses and citation requirements when using the corresponding metrics.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

THEval

Metrics

Leaderboard

Installation

Prepare Videos

Run The Metrics

Quality

Naturalness

Synchronization

Compute A Final Score

Citation

Acknowledgements

About

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Naturalness		Naturalness
Synchronization		Synchronization
Video_Quality		Video_Quality
assets		assets
examples		examples
input_files		input_files
models		models
output_files		output_files
tests		tests
theval		theval
tools		tools
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
final_score.py		final_score.py
install.sh		install.sh
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

THEval

Metrics

Leaderboard

Installation

Prepare Videos

Run The Metrics

Quality

Naturalness

Synchronization

Compute A Final Score

Citation

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages