SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

Le Shen*, Qian Qiao*, Tan Yu*, Ke Zhou, Tianhang Yu, Yu Zhan, Zhenjie Wang, Dingcheng Zhen, Ming Tao, Shunshun Yin, Siyuan Liu ^✉

^*Equal Contribution ^✉Corresponding Author

🔥 News

2026.02.12 - We have released the SoulX-FlashHead, which is a streaming talking head project that achieves real-time performance on consumer GPUs (e.g., RTX 4090/5090).
2026.01.08 - We have released the inference code, and the model weights.
2025.12.30 - We released Project page on SoulX-FlashTalk.
2025.12.30 - We released SoulX-FlashTalk Technical Report on Arxiv and GitHub repository.

🤫 Coming soon

A 4-GPU real-time version of SoulX-FlashTalk.

📑 Todo List

📢 Live Streaming & Video Podcast

Live.Streaming.mp4

🎬 Online Demos

online.demo01.mp4

online.demo02.mp4

🌰 Examples

Girl.mp4

Seal.mp4

Rap.mp4

📖 Quickstart

🔧 Installation

1. Create a Conda environment

conda create -n flashtalk python=3.10
conda activate flashtalk

2. Install PyTorch on CUDA

pip install torch==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl/cu128

3. Install other dependencies

pip install -r requirements.txt

4. Flash-attention installation:

pip install ninja
pip install flash_attn==2.8.0.post2 --no-build-isolation

5. FFmpeg installation

# Ubuntu / Debian
apt-get install ffmpeg
# CentOS / RHEL
yum install ffmpeg ffmpeg-devel

or

# Conda (no root required) 
conda install -c conda-forge ffmpeg==7

🤗 Model download

Model Component	Description	Link
`SoulX-FlashTalk-14B`	Our 14b model	🤗 Huggingface
`chinese-wav2vec2-base`	chinese-wav2vec2-base	🤗 Huggingface

# If you are in china mainland, run this first: export HF_ENDPOINT=https://hf-mirror.com
pip install "huggingface_hub[cli]"
huggingface-cli download Soul-AILab/SoulX-FlashTalk-14B --local-dir ./models/SoulX-FlashTalk-14B
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./models/chinese-wav2vec2-base

🚀 Inference

# Infer on single GPU
# Requires more than 64G of VRAM. Use --cpu_offload to reduce VRAM usage to 40G.
bash inference_script_single_gpu.sh

# Infer on multy GPUs
# Real-time inference speed can only be supported on 8xH800 or higher graphics cards
bash inference_script_multi_gpu.sh

👋 Online Demo

Coming Soon!

📧 Contact Us

If you are interested in leaving a message to our work, feel free to email le.shen@mail.dhu.edu.cn or qiaoqian@soulapp.cn or yutan@soulapp.cn or zhouke@soulapp.cn or liusiyuan@soulapp.cn

Due to Group 1 reaching its capacity, we have opened a new WeChat group. Additionally, we represent SoulApp and warmly welcome everyone to download the app and join our Soul group for further technical discussions and updates!

Join WeChat Group
(加入微信技术群)

Download SoulApp & Join Group
(下载SoulApp加入群组)

📚 Citation

If you find our work useful in your research, please consider citing:

@misc{shen2025soulxflashtalktechnicalreport,
      title={SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation}, 
      author={Le Shen and Qian Qiao and Tan Yu and Ke Zhou and Tianhang Yu and Yu Zhan and Zhenjie Wang and Ming Tao and Shunshun Yin and Siyuan Liu},
      year={2025},
      eprint={2512.23379},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.23379}, 
}

🙇 Acknowledgement

Infinitetalk and Wan: the base model we built upon.
Self forcing: the codebase we built upon.
DMD and Self forcing++: the key distillation technique used by our method.

Tip

If you find our work useful, please also consider starring the original repositories of these foundational methods.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
examples		examples
flash_talk		flash_talk
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate_video.py		generate_video.py
inference_script_multi_gpu.sh		inference_script_multi_gpu.sh
inference_script_single_gpu.sh		inference_script_single_gpu.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

🔥 News

🤫 Coming soon

📑 Todo List

📢 Live Streaming & Video Podcast

🎬 Online Demos

🌰 Examples

📖 Quickstart

🔧 Installation

1. Create a Conda environment

2. Install PyTorch on CUDA

3. Install other dependencies

4. Flash-attention installation:

5. FFmpeg installation

🤗 Model download

🚀 Inference

👋 Online Demo

📧 Contact Us

📚 Citation

🙇 Acknowledgement

💡 Star History

About

Uh oh!

Releases

Packages

Contributors 4

Languages

License

Soul-AILab/SoulX-FlashTalk

Folders and files

Latest commit

History

Repository files navigation

SoulX-FlashTalk: Real-Time Infinite Streaming of Audio-Driven Avatars via Self-Correcting Bidirectional Distillation

🔥 News

🤫 Coming soon

📑 Todo List

📢 Live Streaming & Video Podcast

🎬 Online Demos

🌰 Examples

📖 Quickstart

🔧 Installation

1. Create a Conda environment

2. Install PyTorch on CUDA

3. Install other dependencies

4. Flash-attention installation:

5. FFmpeg installation

🤗 Model download

🚀 Inference

👋 Online Demo

📧 Contact Us

📚 Citation

🙇 Acknowledgement

💡 Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages