Skip to content

Plachtaa/StreamVoiceAnon

Repository files navigation

⚠️This repository is currently under construction.⚠️

StreamVoiceAnon

This repository contains the implementation of StreamVoiceAnon, a real-time voice anonymization / voice conversion model.

Relevant paper has been accepted by ICASSP 2026: Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models.

Installation

git clone https://github.com/Plachtaa/StreamVoiceAnon.git
cd StreamVoiceAnon
pip install -r requirements.txt

If running on Windows OS, please install the following:

pip install triton-windows==3.2.0.post13

Note that this is compulsory to run inference with RTF < 1.0

Full MacOS support is still under construction.

Download Pretrained Models

hf download Plachta/StreamVoiceAnon --local-dir pretrained_checkpoints/

Training

Below is an example command to launch single node multi-GPU training with streaming Emilia dataset from HuggingFace:

accelerate launch trainers/arvc_trainer.py --config_path configs/config_firefly_arvcasr_8192_delay0_8.yaml --mixed-precision bf16

To customize model config or training datasets, we encourage users to read config files or training code.

Inference

Offline inference

python evaluations/infer_arvc.py \
    --src_path <path_to_audio> \
    --ref_path <path_to_audio> \
    --out_dir <path_to_output_directory> \
    --delay 2 \  # Specify delay in number of frames (must have)
    --compile

Simulated online inference

python evaluations/infer_arvc.py \
    --src_path <path_to_audio> \
    --ref_path <path_to_audio> \
    --out_dir <path_to_output_directory> \
    --delay 2 \  # Specify delay in number of frames (must have)
    --compile \
    --simulate_streaming \
    --decode_chunk_frames 1 # how many frames for encoder & vocoder to process each time

This simulates a chunk-by-chunk online inference with specified chunk size. src_path (source audio) has no length limit here. ref_path (reference audio) will be truncated to some maximum length (if longer than that limit)

Real-time inference

python evaluations/real-time-gui.py

This UI uses the same behavior as simulated online inference. It uses --compile by default, so please ensure you have installed triton (as previously stated) before using it.

TODO

  • Release privacy protection code
  • Release metrics for voice conversion & speaker anonymization
  • Release training code (for VC model)
  • Release training code (for content encoder)
  • Release fine-tuning code
  • Full MacOS support
  • More to be added

Citation

If you find our repository valuable for your work, please consider giving a star to this repo and citing our paper:

@misc{kuzmin2026streamvoiceanonenhancingutilityrealtime,
      title={Stream-Voice-Anon: Enhancing Utility of Real-Time Speaker Anonymization via Neural Audio Codec and Language Models}, 
      author={Nikita Kuzmin and Songting Liu and Kong Aik Lee and Eng Siong Chng},
      year={2026},
      eprint={2601.13948},
      archivePrefix={arXiv},
      primaryClass={eess.AS},
      url={https://arxiv.org/abs/2601.13948}, 
}

Acknowledgements

About

Real-time streaming voice anonymization & voice conversion

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages