Skip to content
View manmay-nakhashi's full-sized avatar
  • Bengaluru
  • 10:02 (UTC +05:30)

Block or report manmay-nakhashi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Build resilient language agents as graphs.

Python 11,637 1,933 Updated Apr 17, 2025

Towards Human-Sounding Speech

Python 4,186 340 Updated Apr 16, 2025

TokenBridge: Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation. https://yuqingwang1029.github.io/TokenBridge

Python 93 1 Updated Apr 17, 2025

Spark-TTS Inference Code

Python 8,562 881 Updated Apr 9, 2025

A simple, hackable text-to-speech system in PyTorch and MLX

Python 149 12 Updated Feb 23, 2025

The Gaussian Histogram Loss (HL-Gauss) proposed by Imani et al. with a few convenient wrappers for regression, in Pytorch

Python 58 3 Updated Feb 11, 2025

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 4,820 523 Updated Apr 7, 2025
Python 71 4 Updated Jan 22, 2025

InspireMusic: A Unified Framework for Music, Song, Audio Generation.

Python 1,058 98 Updated Apr 16, 2025

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python 1,287 112 Updated Apr 13, 2025

Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public

Python 82 5 Updated Feb 15, 2025

AudioLDM training, finetuning, evaluation and inference.

Python 245 48 Updated Dec 13, 2024

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python 13,135 1,340 Updated Apr 16, 2025

Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice

Python 285 35 Updated Jan 15, 2025

This is an evolving repo for the paper "Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey".

135 5 Updated Apr 17, 2025

Interface for OuteTTS models.

Python 1,172 101 Updated Apr 14, 2025
Python 452 11 Updated Dec 5, 2024

Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.

Python 4,511 245 Updated Apr 7, 2025

PromptTTS++: Controlling Speaker Identity in Prompt-Based Text-To-Speech Using Natural Language Descriptions

Python 75 5 Updated Oct 11, 2024

Official code repository for paper: "ExPLoRA: Parameter-Efficient Extended Pre-training to Adapt Vision Transformers under Domain Shifts"

31 4 Updated Oct 5, 2024

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.

Python 8,090 668 Updated Apr 15, 2025

StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion

174 12 Updated Sep 27, 2024
Python 68 8 Updated Sep 3, 2024

ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

Python 135 8 Updated Sep 20, 2024

Implementation of E2-TTS, "Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS", in Pytorch

Python 462 44 Updated Mar 12, 2025

A Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS

Python 37 1 Updated Dec 11, 2024

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

131 2 Updated Jun 13, 2024

Automatically Update Text-to-speech (TTS) Papers Daily using Github Actions (Update Every 12th hours)

Python 407 24 Updated Apr 18, 2025

Ultra-low bitrate neural audio codec (0.31~1.40 kbps) with a better semantic in the latent space.

Python 194 15 Updated Mar 7, 2025
Next