Skip to content
View quziyan's full-sized avatar
  • Beijing

Block or report quziyan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Audio

32 repositories

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…

Jupyter Notebook 21,310 2,208 Updated Jan 15, 2025

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 36,856 4,559 Updated Aug 16, 2024

so-vits-svc fork with realtime support, improved interface and more features.

Python 8,858 1,182 Updated Jan 13, 2025

Core Engine of Singing Voice Conversion & Singing Voice Clone

Python 2,712 922 Updated Apr 23, 2024

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models

Python 5,237 452 Updated Aug 10, 2024

GLM-4-Voice | 端到端中英语音对话模型

Python 2,562 207 Updated Dec 5, 2024

MuAViC: A Multilingual Audio-Visual Corpus for Robust Speech Recognition and Robust Speech-to-Text Translation

Python 374 32 Updated Sep 11, 2023

An Open Source text-to-speech system built by inverting Whisper.

Jupyter Notebook 4,078 225 Updated Dec 12, 2024

vits2 backbone with multilingual-bert

Python 8,182 1,156 Updated Jan 13, 2025

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

Jupyter Notebook 528 53 Updated Sep 11, 2023

SOTA Open Source TTS

Python 18,357 1,375 Updated Jan 12, 2025

Use Microsoft Edge's online text-to-speech service from Python WITHOUT needing Microsoft Edge or Windows or an API key

Python 6,808 670 Updated Dec 26, 2024

Voice Recognition to Text Tool / 一个离线运行的本地音视频转字幕工具,输出json、srt字幕、纯文字格式

Python 2,821 308 Updated Dec 5, 2024

一个简单的本地网页界面,使用ChatTTS将文字合成为语音,同时支持对外提供API接口。A simple native web interface that uses ChatTTS to synthesize text into speech, along with support for external API interfaces.

Python 6,529 778 Updated Dec 9, 2024

A generative speech model for daily dialogue.

Python 33,656 3,654 Updated Jan 13, 2025

A modified VITS that utilizes phoneme duration's ground truth for better robustness

Python 122 37 Updated Aug 27, 2023

unofficial vits2-TTS implementation in pytorch

Python 504 95 Updated Mar 28, 2024

AI wearables

C 3,986 524 Updated Jan 15, 2025

Voice activity detector (VAD) for the browser with a simple API

TypeScript 1,029 159 Updated Jan 9, 2025

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 9,594 930 Updated Jan 15, 2025

A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.

1,668 229 Updated Oct 16, 2024

Robust Speech Recognition via Large-Scale Weak Supervision

Python 74,356 8,879 Updated Jan 4, 2025

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 7,725 809 Updated Jan 15, 2025

Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper

Jupyter Notebook 3,986 360 Updated Dec 18, 2024

🔊 Text-Prompted Generative Audio Model

Jupyter Notebook 36,677 4,311 Updated Aug 19, 2024

Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube dow…

Python 2,531 190 Updated Dec 22, 2024

Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组

Python 9,184 897 Updated Jan 5, 2025

✨ AsrTools: 智能语音转文字工具 | 高效批处理 | 用户友好界面 | 无需 GPU |支持 SRT/TXT 输出 | 让您的音频瞬间变成精确文字!

Python 1,606 141 Updated Nov 13, 2024

A sound cloning tool with a web interface, using your voice or any sound to record audio / 一个带web界面的声音克隆工具,使用你的音色或任意声音来录制音频

Python 7,867 820 Updated Dec 7, 2024