Audio
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
so-vits-svc fork with realtime support, improved interface and more features.
Core Engine of Singing Voice Conversion & Singing Voice Clone
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models
MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation
An Open Source text-to-speech system built by inverting Whisper.
vits2 backbone with multilingual-bert
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key
Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式
一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.
A generative speech model for daily dialogue.
A modified VITS that utilizes phoneme duration's ground truth for better robustness
unofficial vits2-TTS implementation in pytorch
Voice activity detector (VAD) for the browser with a simple API
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
Robust Speech Recognition via Large-Scale Weak Supervision
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
🔊 Text-Prompted Generative Audio Model
Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube dow…
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
✨ AsrTools: 智能语音转文字工具 | 高效批处理 | 用户友好界面 | 无需 GPU |支持 SRT/TXT 输出 | 让您的音频瞬间变成精确文字!
A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频