Very fast, accurate speaker diarization
-
Updated
Sep 23, 2025 - Python
Very fast, accurate speaker diarization
Implementation of the model "AudioFlamingo" from the paper: "Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities"
text-to-audio-latent-diffusion
Code to train a custom time-domain autoencoder to dereverb audio
A deep learning-based Speech Emotion Recognition (SER) model trained primarily on Indian languages. Designed for applications in call centers, sentiment analysis, and accessibility tools.
Safe, production-ready starter for voice cloning via SV2TTS (RTVC wrapper). CLI, tests, Docker, CI, pre-commit. No model weights included.
🗣️ Audio AI: Your Audio & Video Transcription Powerhouse!
PodcastAgent uses advanced text-to-speech technology to create natural-sounding multi-speaker podcasts from any written content.
⚡ Accelerate speaker diarization with Senko, processing 1 hour of audio in just 5 seconds on powerful hardware—boost your audio analysis efficiency.
Add a description, image, and links to the audio-ai topic page so that developers can more easily learn about it.
To associate your repository with the audio-ai topic, visit your repo's landing page and select "manage topics."