A PyTorch-based Speech Toolkit
-
Updated
Nov 7, 2025 - Python
A PyTorch-based Speech Toolkit
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Foundation Architecture for (M)LLMs
Multilingual Automatic Speech Recognition with word-level timestamps and confidence
WaveNet vocoder
AI powered speech denoising and enhancement
PyTorch implementation of convolutional neural networks-based text-to-speech synthesis models
Controllable and fast Text-to-Speech for over 7000 languages!
General Speech Restoration
SincNet is a neural architecture for efficiently processing raw audio samples.
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
A Framework for Speech, Language, Audio, Music Processing with Large Language Model
Verbatim Automatic Speech Recognition with improved word-level timestamps and filler detection
A neural network for end-to-end speech denoising
Tensorflow 2.x implementation of the DTLN real time speech denoising model. With TF-lite, ONNX and real-time audio processing support.
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
🔉 spafe: Simplified Python Audio Features Extraction
UniSpeech - Large Scale Self-Supervised Learning for Speech
A python wrapper for Speech Signal Processing Toolkit (SPTK).
Problem Agnostic Speech Encoder
Add a description, image, and links to the speech-processing topic page so that developers can more easily learn about it.
To associate your repository with the speech-processing topic, visit your repo's landing page and select "manage topics."