Skip to content

ANVEAI/voice-ai-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Voice AI Resources

Resources Open Source Last Update

A curated collection of voice AI tools, libraries, datasets, and learning resources for building voice-powered applications.

This list covers the entire voice AI stack — from speech recognition (ASR) to text-to-speech (TTS), voice cloning, real-time conversation, and deployment.


🎙️ Featured: AnveVoice

The easiest way to add voice AI to your website. No coding required.

Why AnveVoice?

  • 🚀 Deploy in 5 minutes with copy-paste embed code
  • 🌍 22 Indian languages + global languages
  • 📊 Built-in analytics and visitor intelligence
  • 🎯 Perfect for support, lead gen, and engagement

Quick start: anvevoice.comDocumentation


Contents


Speech-to-Text (ASR)

Convert speech to text with high accuracy.

Cloud APIs

Service Description Pricing Link
OpenAI Whisper API Industry-leading accuracy, 99 languages $0.006/min platform.openai.com
Google Speech-to-Text Google's ASR with real-time streaming $0.024/min cloud.google.com
AWS Transcribe AWS speech recognition with custom vocab $0.024/min aws.amazon.com
Azure Speech Microsoft's speech service $1/hour azure.microsoft.com
AssemblyAI Developer-friendly API with extras $0.37/hour assemblyai.com
Deepgram Fast, accurate transcription $0.0045/min deepgram.com

Open Source

Project Description Stars Link
Whisper OpenAI's open source ASR model 78k+ github.com/openai/whisper
WhisperX Whisper with word-level timestamps 12k+ github.com/m-bain/whisperX
Faster Whisper Optimized Whisper implementation 13k+ github.com/SYSTRAN/faster-whisper
Wav2Vec 2.0 Meta's speech recognition model - huggingface.co/facebook/wav2vec2
NVIDIA NeMo NVIDIA's conversational AI toolkit 12k+ github.com/NVIDIA/NeMo

Text-to-Speech (TTS)

Convert text to natural-sounding speech.

Cloud APIs

Service Description Pricing Link
ElevenLabs Most realistic voices, voice cloning $5/month elevenlabs.io
OpenAI TTS Simple, high-quality voices $15/1M chars platform.openai.com
Google Cloud TTS Wide language support $4/1M chars cloud.google.com
Azure TTS Microsoft's neural voices $16/1M chars azure.microsoft.com
Amazon Polly AWS text-to-speech $16/1M chars aws.amazon.com/polly
Play.ht Voice cloning and multilingual $39/month play.ht

Open Source

Project Description Stars Link
Piper Fast, local neural TTS 8k+ github.com/rhasspy/piper
Coqui TTS Deep learning TTS toolkit 35k+ github.com/coqui-ai/TTS
Mimic 3 Mycroft's neural TTS 2k+ github.com/MycroftAI/mimic3
Tortoise TTS Quality-focused TTS 13k+ github.com/neonbjb/tortoise-tts
Bark Text-to-audio with emotions 37k+ github.com/suno-ai/bark
StyleTTS 2 Style-based TTS 4k+ github.com/yl4579/StyleTTS2

Voice Cloning

Clone any voice with just seconds of audio.

Service Description Pricing Link
ElevenLabs Voice Cloning Best quality voice cloning $5/month elevenlabs.io/voice-cloning
Play.ht Voice Cloning Instant voice cloning $39/month play.ht/voice-cloning
Resemble AI Real-time voice cloning $30/month resemble.ai
Microsoft Azure Speech Professional voice cloning Custom azure.microsoft.com

Open Source

Project Description Stars Link
Coqui TTS Voice Cloning Your TTS with voice cloning 35k+ github.com/coqui-ai/TTS
Real-Time Voice Cloning SV2TTS implementation 51k+ github.com/CorentinJ/Real-Time-Voice-Cloning
Tortoise TTS Multi-voice TTS 13k+ github.com/neonbjb/tortoise-tts

Conversation AI

End-to-end conversational voice AI systems.

Service Description Use Case Link
AnveVoice Voice AI for websites Website support, lead gen anvevoice.com
Vapi Voice AI platform for developers Phone agents, assistants vapi.ai
Bland AI Hyper-realistic voice AI Call centers, sales bland.ai
Synthflow No-code voice agents Support automation synthflow.ai
Retell AI Conversational voice AI Customer service retellai.com
Pipecat Framework for voice bots Build voice assistants pipecat.ai
Daily.co Real-time video/voice WebRTC infrastructure daily.co

Voice Analytics

Analyze voice conversations for insights.

Service Description Pricing Link
AnveVoice Analytics Visitor intelligence, sentiment From ₹0 anvevoice.com
AssemblyAI LeMUR LLM for audio analysis Custom assemblyai.com
Rev AI Transcription + insights $0.02/min rev.ai
CallRail Call tracking analytics $45/month callrail.com
Invoca AI-powered call analytics Custom invoca.com

Real-time Voice

Low-latency voice streaming and WebRTC.

Service Description Pricing Link
Daily.co WebRTC platform $0.004/min daily.co
Agora Real-time voice/video $0.99/1000 min agora.io
Twilio Voice calls and SIP $0.0085/min twilio.com
100ms Live audio/video SDK $0.004/min 100ms.live
LiveKit Open source WebRTC $0.0018/min livekit.io

Open Source

Project Description Stars Link
LiveKit Open source real-time platform 11k+ github.com/livekit/livekit
Jitsi Meet Secure video conferencing 24k+ github.com/jitsi/jitsi-meet
Mediasoup WebRTC video conferencing 7k+ github.com/versatica/mediasoup
Pion WebRTC Go WebRTC implementation 14k+ github.com/pion/webrtc

Voice SDKs & APIs

Libraries and SDKs for voice integration.

JavaScript/TypeScript

Library Description Link
Web Speech API Browser-native speech recognition MDN Docs
Annyang Voice command library talater.com/annyang
Artyom.js Voice commands and synthesis sdkcarlos.github.io/sites/artyom.html
annyang-2 Modernized fork of Annyang GitHub

Python

Library Description Link
SpeechRecognition Python speech recognition github.com/Uberi/speech_recognition
PyAudio Audio I/O for Python people.csail.mit.edu/hubert/pyaudio
webrtcvad Voice Activity Detection github.com/wiseman/py-webrtcvad
simpleaudio Simple audio playback simpleaudio.readthedocs.io

Open Source Models

Free, self-hostable voice AI models.

Speech Recognition

Model Size Language Link
Whisper Large v3 1.5B params 99 languages OpenAI
Whisper Medium 769M params 99 languages OpenAI
Wav2Vec 2.0 Large 317M params English Facebook
NVIDIA Canary 1B params Multi-language NVIDIA

Text-to-Speech

Model Quality Speed Link
Piper Good Real-time rhasspy
StyleTTS 2 Excellent Fast yl4579
XTTS v2 Excellent Medium Coqui
Bark Good Slow Suno

Datasets

Training data for voice AI models.

Dataset Description Size Link
Common Voice Mozilla's multilingual corpus 31k hours commonvoice.mozilla.org
LibriSpeech English audiobooks 1000 hours openslr.org/12
VoxCeleb Celebrity voices 2000 hours robots.ox.ac.uk/~vgg/data/voxceleb
LJSpeech Single speaker English 24 hours keithito.com/LJ-Speech-Dataset
AISHELL Mandarin speech 178 hours aishelltech.com
Indic Voices Indian languages Various AI4Bharat

Learning Resources

Courses, tutorials, and documentation.

Courses

Course Platform Level Link
Deep Learning for NLP Coursera Intermediate coursera.org
Speech Recognition Fast.ai Advanced fast.ai
Voice AI Fundamentals DeepLearning.AI Beginner deeplearning.ai

Books

Book Author Level
Speech and Language Processing Jurafsky & Martin Advanced
Deep Learning Goodfellow et al. Intermediate
Voice Applications for Alexa and Google Assistant Dustin Coates Beginner

Communities

Community Platform Link
r/MachineLearning Reddit reddit.com/r/MachineLearning
Voice AI Discord Discord Various
OpenAI Community Forum community.openai.com

Deployment & Infrastructure

Hosting and scaling voice AI.

Service Description Pricing Link
Hugging Face Inference Model hosting $0.06/hour/GPU huggingface.co
Replicate Run ML models Per prediction replicate.com
Banana.dev Serverless GPU Per second banana.dev
Modal Labs Serverless compute Per usage modal.com
RunPod GPU cloud $0.20/hour runpod.io
Vast.ai GPU marketplace From $0.10/hour vast.ai

Contributing

  1. Check if the resource exists — avoid duplicates
  2. Ensure it's voice AI related
  3. Submit a PR with the resource in the appropriate section
  4. Follow the existing format

See contributing.md for detailed guidelines.


Related Awesome Lists


Made with ❤️ by the voice AI community
Curated by AnveVoice — Voice AI for websites

About

A curated collection of voice AI tools, libraries, datasets, and learning resources

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors