Stars
Awesome speech/audio LLMs, representation learning, and codec models
A high-throughput and memory-efficient inference and serving engine for LLMs
Official repository for Mamba-based Segmentation Model for Speaker Diarization
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
real time face swap and one-click video deepfake with only a single image
✨✨VITA: Towards Open-Source Interactive Omni Multimodal LLM
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversation
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
A beautiful, simple, clean, and responsive Jekyll theme for academics
The official pytorch implemention of the Intespeech 2024 paper "Reshape Dimensions Network for Speaker Recognition"
基于ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B模型,进行下游具体任务微调,涉及Freeze、Lora、P-tuning、全参微调等
We Speech Transcript based on LLM, in 300 lines of code.
A book about Text-to-Speech (TTS) in Chinese.
Transformer: PyTorch Implementation of "Attention Is All You Need"
Faster Whisper transcription with CTranslate2
A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"
Speech, Language, Audio, Music Processing with Large Language Model
Different implementations of "Weighted Prediction Error" for speech dereverberation