GitHub - Render-AI/DeepAudio-V1

DeepAudio-V1

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Haomin Zhang, Chang Liu, Junjie Zheng, Zihao Chen, Chaofan Ding, Xinhan Di

AI Lab Giant Network, and University of Trento

Results

v2c_1.mp4

v2c_2.mp4

For more results, please visit https://acappemin.github.io/DeepAudio-V1.github.io.

Installation

1. Create a conda environment

conda create -n v2as python=3.10
conda activate v2as

2. F5-TTS base install

cd ./F5-TTS
pip install -e .

3. Additional requirements

pip install -r requirements.txt
conda install cudnn

Pretrained models

The models are available at https://huggingface.co/lshzhm/DeepAudio-V1/tree/main. See MODELS.md for more details.

Inference

1. V2A inference

bash v2a.sh

2. V2S inference

bash v2s.sh

3. TTS inference

bash tts.sh

Evaluation

bash eval_v2c.sh

Acknowledgement

MMAudio for video-to-audio backbone and pretrained models
F5-TTS for text-to-speech and video-to-speech backbone
V2C for animated movie benchmark
Wav2Vec2-Emotion for emotion recognition in EMO-SIM evaluation.
WavLM-SV for speech recognition in SPK-SIM evaluation.
Whisper for speech recognition in WER evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepAudio-V1

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Results

Installation

Inference

Evaluation

Acknowledgement

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
F5-TTS		F5-TTS
MMAudio		MMAudio
eval		eval
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
MODELS.md		MODELS.md
README.md		README.md
eval_v2c.sh		eval_v2c.sh
requirements.txt		requirements.txt
tts.sh		tts.sh
v2a.sh		v2a.sh
v2s.sh		v2s.sh

Render-AI/DeepAudio-V1

Folders and files

Latest commit

History

Repository files navigation

DeepAudio-V1

DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation

Results

Installation

Inference

Evaluation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages