Lists (1)
Sort Name descending (Z-A)
Starred repositories
🗣️🇧🇷 Bases de áudio transcrito em Português Brasileiro
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Awesome music generation model——MG²
Implementation of MusicLM, Google's new SOTA model for music generation using attention networks, in Pytorch
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
21 Lessons, Get Started Building with Generative AI 🔗 https://microsoft.github.io/generative-ai-for-beginners/
A Non-Autoregressive Transformer based Text-to-Speech, supporting a family of SOTA transformers with supervised and unsupervised duration modelings. This project grows with the research community, …
A HuggingFace compatible Small Language Model trainer.
Speech To Speech: an effort for an open-sourced and modular GPT4-o
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
SOTA discrete acoustic codec models with 40 tokens per second for audio language modeling
LLaST: Improved End-to-end Speech Translation System Leveraged by Large Language Models
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
XPhoneBERT: A Pre-trained Multilingual Model for Phoneme Representations for Text-to-Speech (INTERSPEECH 2023)
velvet os - simple script framework to build ubuntu 22.04 lts jammy (in older versions also 20.04 lts focal) and debian 12 bookworm (in older versions also 11 bullseye) bootable usb / sd card image…
Official implementation for our paper "Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations"
A generative speech model for daily dialogue.
State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
Inference and training library for high-quality TTS models.
Official repo for WavCraft, an AI agent for audio creation and editing