Stars
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
中文对话0.2B小模型(ChatLM-Chinese-0.2B),开源所有数据集来源、数据清洗、tokenizer训练、模型预训练、SFT指令微调、RLHF优化等流程的全部代码。支持下游任务sft微调,给出三元组信息抽取微调示例。
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Inference and training library for high-quality TTS models.
An efficient implementation of tree data structure in pure python.
Zero-Shot Speech Editing and Text-to-Speech in the Wild
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Use naive MultiheadAttention implement to replace nn.MultiheadAttention in pytorch
PyTorch Reimplementation of LoRA (featuring with supporting nn.MultiheadAttention)
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
text to speech using autoregressive transformer and VITS
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
The official implementation of HierSpeech++
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing