Voice AI Resources

A curated collection of voice AI tools, libraries, datasets, and learning resources for building voice-powered applications.

This list covers the entire voice AI stack — from speech recognition (ASR) to text-to-speech (TTS), voice cloning, real-time conversation, and deployment.

🎙️ Featured: AnveVoice

The easiest way to add voice AI to your website. No coding required.

Why AnveVoice?

🚀 Deploy in 5 minutes with copy-paste embed code
🌍 22 Indian languages + global languages
📊 Built-in analytics and visitor intelligence
🎯 Perfect for support, lead gen, and engagement

Quick start: anvevoice.com • Documentation

Speech-to-Text (ASR)

Convert speech to text with high accuracy.

Cloud APIs

Service	Description	Pricing	Link
OpenAI Whisper API	Industry-leading accuracy, 99 languages	$0.006/min	platform.openai.com
Google Speech-to-Text	Google's ASR with real-time streaming	$0.024/min	cloud.google.com
AWS Transcribe	AWS speech recognition with custom vocab	$0.024/min	aws.amazon.com
Azure Speech	Microsoft's speech service	$1/hour	azure.microsoft.com
AssemblyAI	Developer-friendly API with extras	$0.37/hour	assemblyai.com
Deepgram	Fast, accurate transcription	$0.0045/min	deepgram.com

Open Source

Project	Description	Stars	Link
Whisper	OpenAI's open source ASR model	78k+	github.com/openai/whisper
WhisperX	Whisper with word-level timestamps	12k+	github.com/m-bain/whisperX
Faster Whisper	Optimized Whisper implementation	13k+	github.com/SYSTRAN/faster-whisper
Wav2Vec 2.0	Meta's speech recognition model	-	huggingface.co/facebook/wav2vec2
NVIDIA NeMo	NVIDIA's conversational AI toolkit	12k+	github.com/NVIDIA/NeMo

Text-to-Speech (TTS)

Convert text to natural-sounding speech.

Cloud APIs

Service	Description	Pricing	Link
ElevenLabs	Most realistic voices, voice cloning	$5/month	elevenlabs.io
OpenAI TTS	Simple, high-quality voices	$15/1M chars	platform.openai.com
Google Cloud TTS	Wide language support	$4/1M chars	cloud.google.com
Azure TTS	Microsoft's neural voices	$16/1M chars	azure.microsoft.com
Amazon Polly	AWS text-to-speech	$16/1M chars	aws.amazon.com/polly
Play.ht	Voice cloning and multilingual	$39/month	play.ht

Open Source

Project	Description	Stars	Link
Piper	Fast, local neural TTS	8k+	github.com/rhasspy/piper
Coqui TTS	Deep learning TTS toolkit	35k+	github.com/coqui-ai/TTS
Mimic 3	Mycroft's neural TTS	2k+	github.com/MycroftAI/mimic3
Tortoise TTS	Quality-focused TTS	13k+	github.com/neonbjb/tortoise-tts
Bark	Text-to-audio with emotions	37k+	github.com/suno-ai/bark
StyleTTS 2	Style-based TTS	4k+	github.com/yl4579/StyleTTS2

Voice Cloning

Clone any voice with just seconds of audio.

Service	Description	Pricing	Link
ElevenLabs Voice Cloning	Best quality voice cloning	$5/month	elevenlabs.io/voice-cloning
Play.ht Voice Cloning	Instant voice cloning	$39/month	play.ht/voice-cloning
Resemble AI	Real-time voice cloning	$30/month	resemble.ai
Microsoft Azure Speech	Professional voice cloning	Custom	azure.microsoft.com

Open Source

Project	Description	Stars	Link
Coqui TTS Voice Cloning	Your TTS with voice cloning	35k+	github.com/coqui-ai/TTS
Real-Time Voice Cloning	SV2TTS implementation	51k+	github.com/CorentinJ/Real-Time-Voice-Cloning
Tortoise TTS	Multi-voice TTS	13k+	github.com/neonbjb/tortoise-tts

Conversation AI

End-to-end conversational voice AI systems.

Service	Description	Use Case	Link
AnveVoice ⭐	Voice AI for websites	Website support, lead gen	anvevoice.com
Vapi	Voice AI platform for developers	Phone agents, assistants	vapi.ai
Bland AI	Hyper-realistic voice AI	Call centers, sales	bland.ai
Synthflow	No-code voice agents	Support automation	synthflow.ai
Retell AI	Conversational voice AI	Customer service	retellai.com
Pipecat	Framework for voice bots	Build voice assistants	pipecat.ai
Daily.co	Real-time video/voice	WebRTC infrastructure	daily.co

Voice Analytics

Analyze voice conversations for insights.

Service	Description	Pricing	Link
AnveVoice Analytics	Visitor intelligence, sentiment	From ₹0	anvevoice.com
AssemblyAI LeMUR	LLM for audio analysis	Custom	assemblyai.com
Rev AI	Transcription + insights	$0.02/min	rev.ai
CallRail	Call tracking analytics	$45/month	callrail.com
Invoca	AI-powered call analytics	Custom	invoca.com

Real-time Voice

Low-latency voice streaming and WebRTC.

Service	Description	Pricing	Link
Daily.co	WebRTC platform	$0.004/min	daily.co
Agora	Real-time voice/video	$0.99/1000 min	agora.io
Twilio	Voice calls and SIP	$0.0085/min	twilio.com
100ms	Live audio/video SDK	$0.004/min	100ms.live
LiveKit	Open source WebRTC	$0.0018/min	livekit.io

Open Source

Project	Description	Stars	Link
LiveKit	Open source real-time platform	11k+	github.com/livekit/livekit
Jitsi Meet	Secure video conferencing	24k+	github.com/jitsi/jitsi-meet
Mediasoup	WebRTC video conferencing	7k+	github.com/versatica/mediasoup
Pion WebRTC	Go WebRTC implementation	14k+	github.com/pion/webrtc

Voice SDKs & APIs

Libraries and SDKs for voice integration.

JavaScript/TypeScript

Library	Description	Link
Web Speech API	Browser-native speech recognition	MDN Docs
Annyang	Voice command library	talater.com/annyang
Artyom.js	Voice commands and synthesis	sdkcarlos.github.io/sites/artyom.html
annyang-2	Modernized fork of Annyang	GitHub

Python

Library	Description	Link
SpeechRecognition	Python speech recognition	github.com/Uberi/speech_recognition
PyAudio	Audio I/O for Python	people.csail.mit.edu/hubert/pyaudio
webrtcvad	Voice Activity Detection	github.com/wiseman/py-webrtcvad
simpleaudio	Simple audio playback	simpleaudio.readthedocs.io

Open Source Models

Free, self-hostable voice AI models.

Speech Recognition

Model	Size	Language	Link
Whisper Large v3	1.5B params	99 languages	OpenAI
Whisper Medium	769M params	99 languages	OpenAI
Wav2Vec 2.0 Large	317M params	English	Facebook
NVIDIA Canary	1B params	Multi-language	NVIDIA

Text-to-Speech

Model	Quality	Speed	Link
Piper	Good	Real-time	rhasspy
StyleTTS 2	Excellent	Fast	yl4579
XTTS v2	Excellent	Medium	Coqui
Bark	Good	Slow	Suno

Datasets

Training data for voice AI models.

Dataset	Description	Size	Link
Common Voice	Mozilla's multilingual corpus	31k hours	commonvoice.mozilla.org
LibriSpeech	English audiobooks	1000 hours	openslr.org/12
VoxCeleb	Celebrity voices	2000 hours	robots.ox.ac.uk/~vgg/data/voxceleb
LJSpeech	Single speaker English	24 hours	keithito.com/LJ-Speech-Dataset
AISHELL	Mandarin speech	178 hours	aishelltech.com
Indic Voices	Indian languages	Various	AI4Bharat

Learning Resources

Courses, tutorials, and documentation.

Courses

Course	Platform	Level	Link
Deep Learning for NLP	Coursera	Intermediate	coursera.org
Speech Recognition	Fast.ai	Advanced	fast.ai
Voice AI Fundamentals	DeepLearning.AI	Beginner	deeplearning.ai

Books

Book	Author	Level
Speech and Language Processing	Jurafsky & Martin	Advanced
Deep Learning	Goodfellow et al.	Intermediate
Voice Applications for Alexa and Google Assistant	Dustin Coates	Beginner

Communities

Community	Platform	Link
r/MachineLearning	Reddit	reddit.com/r/MachineLearning
Voice AI Discord	Discord	Various
OpenAI Community	Forum	community.openai.com

Deployment & Infrastructure

Hosting and scaling voice AI.

Service	Description	Pricing	Link
Hugging Face Inference	Model hosting	$0.06/hour/GPU	huggingface.co
Replicate	Run ML models	Per prediction	replicate.com
Banana.dev	Serverless GPU	Per second	banana.dev
Modal Labs	Serverless compute	Per usage	modal.com
RunPod	GPU cloud	$0.20/hour	runpod.io
Vast.ai	GPU marketplace	From $0.10/hour	vast.ai

Contributing

Check if the resource exists — avoid duplicates
Ensure it's voice AI related
Submit a PR with the resource in the appropriate section
Follow the existing format

See contributing.md for detailed guidelines.

Related Awesome Lists

Awesome OpenClaw Skills — OpenClaw automation skills
Awesome MCP Servers — MCP server collection
Awesome Machine Learning — ML resources

Made with ❤️ by the voice AI community
_{Curated by AnveVoice — Voice AI for websites}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
LICENSE		LICENSE
README.md		README.md
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice AI Resources

🎙️ Featured: AnveVoice

Contents

Speech-to-Text (ASR)

Cloud APIs

Open Source

Text-to-Speech (TTS)

Cloud APIs

Open Source

Voice Cloning

Open Source

Conversation AI

Voice Analytics

Real-time Voice

Open Source

Voice SDKs & APIs

JavaScript/TypeScript

Python

Open Source Models

Speech Recognition

Text-to-Speech

Datasets

Learning Resources

Courses

Books

Communities

Deployment & Infrastructure

Contributing

Related Awesome Lists

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Voice AI Resources

🎙️ Featured: AnveVoice

Contents

Speech-to-Text (ASR)

Cloud APIs

Open Source

Text-to-Speech (TTS)

Cloud APIs

Open Source

Voice Cloning

Open Source

Conversation AI

Voice Analytics

Real-time Voice

Open Source

Voice SDKs & APIs

JavaScript/TypeScript

Python

Open Source Models

Speech Recognition

Text-to-Speech

Datasets

Learning Resources

Courses

Books

Communities

Deployment & Infrastructure

Contributing

Related Awesome Lists

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages