Name		Name	Last commit message	Last commit date
parent directory ..
SynthesisEvaluation		SynthesisEvaluation
anonymization		anonymization
experiments/base/bnf_vq_ft_512		experiments/base/bnf_vq_ft_512
hubert		hubert
preprocess		preprocess
speaker		speaker
src		src
LICENSE		LICENSE
README.md		README.md
inference.ipynb		inference.ipynb
requirements.txt		requirements.txt

README.md

End-to-end streaming model for low-latency speech anonymization

Waris Quamer, Ricardo Gutierrez-Osuna

In our paper, we proposed a lightweight alternative for speech synthesis to perform real-time speech anonymization.
We provide our implementation and pretrained models in this repository.

Abstract : Speaker anonymization aims to conceal cues to speaker identity while preserving linguistic content. Current machine learning based approaches require substantial computational resources, hindering real-time streaming applications. To address these concerns, we propose a streaming model that achieves speaker anonymization with low latency. The system is trained in an end-to-end autoencoder fashion using a lightweight content encoder that extracts HuBERT-like information, a pretrained speaker encoder that extract speaker identity, and a variance encoder that injects pitch and energy information. These three disentangled representations are fed to a decoder that re-synthesizes the speech signal. We present evaluation results from two implementations of our system, a full model that achieves a latency of 230ms, and a lite version (0.1x in size) that further reduces latency to 66ms while maintaining state-of-the-art performance in naturalness, intelligibility, and privacy preservation.

Visit our demo website for audio samples.

Pre-requisites

Python >= 3.10
Clone this repository.
Install python requirements. Please refer requirements.txt
Download and extract the LibriTTS dataset. And move all wav files to data folder
Convert all files to waveform and rearrange the data folder to look like

    data
    ├── metadata
    └── speakers                 # Folder containing all speakers
        ├── spkr1                   
        ├── spkr2
        │    └── wav             # Folder containing all wav files
        │        ├── file1.wav
        │        ├── file2.wav
        │        └── file3.wav
        ├── spkr3
        └── spkr4

Build similar structure separately for validation or test data.
Download the Hubert Base checkpoint from here and place it under pretrained_models/hubert folder.

Data Preprocessing

To preprocess data, run

./data_preprocess.sh

Edit data_preprocess.sh file accordingly to preprocess data at a different path, specifically change data argument to point to the path of your data folder. Need to run separately for train and validation folders.

After running the data preprocessing, your data directory should look like

    data
    ├── metadata
    └── speakers                 
        ├── spkr1                   
        ├── spkr2
        │    ├── code               
        │    │   ├── file1.km
        │    │   └── file2.km
        │    ├── energy             
        │    │   ├── file1.eng.npy
        │    │   └── file2.eng.npy
        │    ├── pitch           
        │    │   ├── file1.pit.npy
        │    │   └── file2.pit.npy
        │    ├── spkr             
        │    │   ├── file1.spk.npy
        │    │   └── file2.spk.npy
        │    └── wav             
        │        ├── file1.wav
        │        └── file2.wav
        ├── spkr3
        └── spkr4

Training

Pretrain only the Encoder

python train_encoder.py -p experiments/base/encoder -c experiments/base/config.json

Train Encoder and Decoder together

python train.py -p experiments/base -c experiments/base/config.json

Pretrained Model

You can also use pretrained models we provide.
Download pretrained models

Place the pretrained base and lite checkpoints under experiments/base and experiments/lite respectively.

Inference

See inference.ipynb.

Acknowledgements

We referred to HiFiGAN, Fairseq and Speechbrain to implement this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

streamVC

streamVC

README.md

End-to-end streaming model for low-latency speech anonymization

Waris Quamer, Ricardo Gutierrez-Osuna

Pre-requisites

Data Preprocessing

Training

Pretrained Model

Inference

Acknowledgements

Files

streamVC

Directory actions

More options

Directory actions

More options

Latest commit

History

streamVC

Folders and files

parent directory

README.md

End-to-end streaming model for low-latency speech anonymization

Waris Quamer, Ricardo Gutierrez-Osuna

Pre-requisites

Data Preprocessing

Training

Pretrained Model

Inference

Acknowledgements