Stars
Added vLLM support to IndexTTS for faster inference.
Text-audio foundation model from Boson AI
Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching
An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
[NeurIPS'24 Spotlight] Text2CAD: Generating Sequential CAD Designs from Beginner-to-Expert Level Text Prompts
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Official implementation of AnimateDiff.
Inference and training library for high-quality TTS models.
An efficient implementation of tree data structure in pure python.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch