Using pyannote.audio
open-source toolkit in production?
Consider switching to pyannoteAI for better and faster options.
pyannote.audio
is an open-source toolkit written in Python for speaker diarization. Based on PyTorch machine learning framework, it comes with state-of-the-art pretrained models and pipelines, that can be further finetuned to your own data for even better performance.
- 🤯 state-of-the-art performance (see Benchmark)
- 🤗 pretrained pipelines (and models) on 🤗 model hub
- 🚀 built-in support for pyannoteAI premium speaker diarization
- 🐍 Python-first API
- ⚡ multi-GPU training with pytorch-lightning
- Install
pyannote.audio
withpip install pyannote.audio
- Accept
pyannote/segmentation-3.0
user conditions - Accept
pyannote/speaker-diarization-3.1
user conditions - Create Huggingface access token at
hf.co/settings/tokens
.
import torch
from pyannote.audio import Pipeline
from pyannote.audio.pipelines.utils.hook import ProgressHook
# Open-source pyannote speaker diarization pipeline
pipeline = Pipeline.from_pretrained(
"pyannote/speaker-diarization-3.1",
token="HUGGINGFACE_ACCESS_TOKEN")
# send pipeline to GPU (when available)
pipeline.to(torch.device("cuda"))
# apply pretrained pipeline (with optional progress hook)
with ProgressHook() as hook:
diarization = pipeline("audio.wav", hook=hook) # runs locally
# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s speaker_{speaker}")
# start=0.2s stop=1.5s speaker_0
# start=1.8s stop=3.9s speaker_1
# start=4.2s stop=5.7s speaker_0
# ...
- Install
pyannote.audio
withpip install pyannote.audio
- Create pyannoteAI API key at
dashboard.pyannote.ai
from pyannote.audio import Pipeline
# Premium pyannoteAI speaker diarization service
pipeline = Pipeline.from_pretrained(
"pyannoteAI/speaker-diarization-precision", token="PYANNOTEAI_API_KEY")
diarization = pipeline("audio.wav") # runs on pyannoteAI servers
# print the result
for turn, _, speaker in diarization.itertracks(yield_label=True):
print(f"start={turn.start:.1f}s stop={turn.end:.1f}s {speaker}")
# start=0.2s stop=1.6s SPEAKER_00
# start=1.8s stop=4.0s SPEAKER_01
# start=4.2s stop=5.6s SPEAKER_00
# ...
Visit docs.pyannote.ai
to learn about other pyannoteAI features (voiceprinting, confidence scores, ...)
Out of the box, pyannote.audio
speaker diarization pipeline v3.1 is expected to be much better (and faster) than v2.x. pyannoteAI
premium model goes one step further. Those numbers are diarization error rates (in %) - the lower the better.
Benchmark (2025-03) | v2.1 | v3.1 | |
---|---|---|---|
AISHELL-4 | 14.1 | 12.2 | 12.1 |
AliMeeting (channel 1) | 27.4 | 24.5 | 19.8 |
AMI (IHM) | 18.9 | 18.8 | 15.8 |
AMI (SDM) | 27.1 | 22.7 | 18.3 |
AVA-AVD | 66.3 | 49.7 | 45.3 |
CALLHOME (part 2) | 31.6 | 28.4 | 20.1 |
DIHARD 3 (full) | 26.9 | 21.4 | 17.2 |
Earnings21 | 17.0 | 9.4 | 9.0 |
Ego4D (dev.) | 61.5 | 51.2 | 45.8 |
MSDWild | 32.8 | 25.4 | 19.7 |
RAMC | 22.5 | 22.2 | 11.1 |
REPERE (phase2) | 8.2 | 7.9 | 7.6 |
VoxConverse (v0.3) | 11.2 | 11.2 | 9.9 |
Diarization error rate (in %)
- Changelog
- Frequently asked questions
- Models
- Available tasks explained
- Applying a pretrained model
- Training, fine-tuning, and transfer learning
- Pipelines
- Available pipelines explained
- Applying a pretrained pipeline
- Adapting a pretrained pipeline to your own data
- Training a pipeline
- Contributing
- Adding a new model
- Adding a new task
- Adding a new pipeline
- Sharing pretrained models and pipelines
- Blog
- Videos
- Introduction to speaker diarization / JSALT 2023 summer school / 90 min
- Speaker segmentation model / Interspeech 2021 / 3 min
- First release of pyannote.audio / ICASSP 2020 / 8 min
- Community contributions (not maintained by the core team)
- 2024-04-05 > Offline speaker diarization (speaker-diarization-3.1) by Simon Ottenhaus
- 2024-09-24 > Evaluating
pyannote
pretrained speech separation pipelines by Clément Pagés
If you use pyannote.audio
please use the following citations:
@inproceedings{Plaquet23,
author={Alexis Plaquet and Hervé Bredin},
title={{Powerset multi-class cross entropy loss for neural speaker diarization}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
@inproceedings{Bredin23,
author={Hervé Bredin},
title={{pyannote.audio 2.1 speaker diarization pipeline: principle, benchmark, and recipe}},
year=2023,
booktitle={Proc. INTERSPEECH 2023},
}
The commands below will setup pre-commit hooks and packages needed for developing the pyannote.audio
library.
pip install -e .[dev,testing]
pre-commit install
pytest