- Bengaluru
-
10:02
(UTC +05:30)
Stars
Build resilient language agents as graphs.
TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge
A simple, hackable text-to-speech system in PyTorch and MLX
The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch
YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open
InspireMusic: A Unified Framework for Music, Song, Audio Generation.
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public
AudioLDM training, finetuning, evaluation and inference.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions
Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion
ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis
Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch
A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)
Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.