
- Bengaluru
-
07:05
(UTC +05:30)
Stars
MARS5 speech model (TTS) from CAMB.AI
A generative speech model for daily dialogue.
🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" (https://arxiv.org/abs/2404.07143)
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
End-to-end platform for building voice first multimodal agents
litagin02 / Style-Bert-VITS2
Forked from fishaudio/Bert-VITS2Style-Bert-VITS2: Bert-VITS2 with more controllable voice styles.
Foundational model for human-like, expressive TTS
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
A ggml (C++) re-implementation of tortoise-tts
AI powered speech denoising and enhancement
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, 17+ clouds, or on-prem).
DLAS - A configuration-driven trainer for generative models
Reading list for research topics in Sound AI
Unsupervised Video Summarization via Successor Embeddings
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)