Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
README.md		README.md
__init__.py		__init__.py
inference.py		inference.py
models.py		models.py
train.py		train.py

README.md

Unit-based HiFi-GAN Vocoder with Duration Prediction

We provide implementation for the unit-based HiFi-GAN vocoder with a duration prediction module used in the direct speech-to-speech translation models in [1, 2].

Training

# an example of training with HuBERT units

python -m torch.distributed.launch --nproc_per_node <NUM_GPUS> \
    -m examples.speech_to_speech_translation.train \
    --checkpoint_path checkpoints/lj_hubert100_dur1.0 \
    --config examples/speech_to_speech_translation/configs/hubert100_dw1.0.json

Inference

To generate with duration prediction, simply run:

python -m examples.speech_to_speech_translation.inference \
    --checkpoint_file checkpoints/lj_hubert100_dur1.0 \
    -n 10 \
    --output_dir generations \
    --num-gpu <NUM_GPUS> \
    --input_code_file ./datasets/LJSpeech/hubert100/val.txt \
    --dur-prediction

fairseq

We also provide an implementation in fairseq for inference. See "Convert unit sequences to waveform" in the example.

References

[1] Direct speech-to-speech translation with discrete units
[2] Textless Speech-to-Speech Translation on Real Data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speech_to_speech_translation

speech_to_speech_translation

README.md

Unit-based HiFi-GAN Vocoder with Duration Prediction

Training

Inference

fairseq

References

Files

speech_to_speech_translation

Directory actions

More options

Directory actions

More options

Latest commit

History

speech_to_speech_translation

Folders and files

parent directory

README.md

Unit-based HiFi-GAN Vocoder with Duration Prediction

Training

Inference

fairseq

References