SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation
Le Shen*, Qian Qiao*, Tan Yu*, Ke Zhou, Tianhang Yu, Yu Zhan, Zhenjie Wang, Dingcheng Zhen, Ming Tao, Shunshun Yin, Siyuan Liu โ
*Equal Contribution โCorresponding Author
- 2026.02.12 - We have released the SoulX-FlashHead, which is a streaming talking head project that achieves real-time performance on consumer GPUs (e.g., RTX 4090/5090).
- 2026.01.08 - We have released the inference code, and the model weights.
- 2025.12.30 - We released Project page on SoulX-FlashTalk.
- 2025.12.30 - We released SoulX-FlashTalk Technical Report on Arxiv and GitHub repository.
A 4-GPU real-time version of SoulX-FlashTalk.
- Technical report
- Project Page
- Inference code
- Checkpoint release
- Online demo
Live.Streaming.mp4
online.demo01.mp4 |
online.demo02.mp4 |
Girl.mp4 |
Seal.mp4 |
Rap.mp4 |
conda create -n flashtalk python=3.10
conda activate flashtalkpip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128pip install -r requirements.txtpip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-develor
# Conda (no root required)
conda install -c conda-forge ffmpeg==7| Model Component | Description | Link |
|---|---|---|
SoulX-FlashTalk-14B |
Our 14b model | ๐ค Huggingface |
chinese-wav2vec2-base |
chinese-wav2vec2-base | ๐ค Huggingface |
# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashTalk-14B --local-dir ./models/SoulX-FlashTalk-14B
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./models/chinese-wav2vec2-base# Infer on single GPU
# Requires more than 64G of VRAM. Use --cpu_offload to reduce VRAM usage to 40G.
bash inference_script_single_gpu.sh
# Infer on multy GPUs
# Real-time inference speed can only be supported on 8xH800 or higher graphics cards
bash inference_script_multi_gpu.shComing Soon!
If you are interested in leaving a message to our work, feel free to email le.shen@mail.dhu.edu.cn or qiaoqian@soulapp.cn or yutan@soulapp.cn or zhouke@soulapp.cn or liusiyuan@soulapp.cn
Due to Group 1 reaching its capacity, we have opened a new WeChat group. Additionally, we represent SoulApp and warmly welcome everyone to download the app and join our Soul group for further technical discussions and updates!
If you find our work useful in your research, please consider citing:
@misc{shen2025soulxflashtalktechnicalreport,
title={SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation},
author={Le Shen and Qian Qiao and Tan Yu and Ke Zhou and Tianhang Yu and Yu Zhan and Zhenjie Wang and Ming Tao and Shunshun Yin and Siyuan Liu},
year={2025},
eprint={2512.23379},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.23379},
}
- Infinitetalk and Wan: the base model we built upon.
- Self forcing: the codebase we built upon.
- DMD and Self forcing++: the key distillation technique used by our method.
Tip
If you find our work useful, please also consider starring the original repositories of these foundational methods.

