AI 语音助手 | AI Voice Assistant

A web-based AI voice assistant that integrates speech recognition, AI conversation, and speech synthesis. Users can interact with AI through real-time voice conversations.

Key Features

Real-time speech recognition (based on Baidu Speech Service)
AI conversation (powered by Gemini API)
Text-to-Speech (TTS)
Real-time audio visualization
Automatic voice input detection
Support for interruption and continuous dialogue

Requirements

Node.js 16.0 or above
Modern browser (with WebAudio API support)
Internet connection (for API calls)
Python 3.7 or above
edge-tts library (for speech synthesis)

Quick Start

1. Install Dependencies

# Install pnpm globally (if not installed)
npm install -g pnpm

# Install project dependencies
pnpm install

# Backend dependencies
cd Backend
pip install -r requirements.txt

2. Start Backend Service

# Navigate to backend directory
cd Backend

# Start backend service
python edge-tts.py

Note: The backend service must run on port 8000 for the speech synthesis feature to work properly.

3. Configure Environment Variables

Copy .env.example file and rename it to .env:

cp .env.example .env

Fill in your API keys in the .env file:

# Baidu API Configuration
BAIDU_API_KEY=your_baidu_api_key_here
BAIDU_SECRET_KEY=your_baidu_secret_key_here
BAIDU_TOKEN_URL=https://aip.baidubce.com/oauth/2.0/token
BAIDU_ASR_URL=https://vop.baidu.com/server_api

# Gemini API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_API_URL=https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent

4. Start Frontend Development Server

# Return to project root
cd ..

# Install frontend dependencies
pnpm install

# Start development server
pnpm dev

Visit http://localhost:5173 to use the voice assistant.

Tip: We recommend using pnpm as the package manager for faster installation and better disk space utilization.

System Architecture

The project is divided into frontend and backend:

Frontend: React + Vite web application
- Speech recognition (Baidu Speech Service)
- AI conversation (Gemini API)
- Audio visualization
Backend: Python Flask application
- Speech synthesis service (based on edge-tts)
- Running on port 8000

Obtaining API Keys

Baidu Speech Service

Visit Baidu Cloud
Register/Login
Create a speech application
Get API Key and Secret Key

Gemini API

Visit Google AI Studio
Login with Google account
Create API key

Security Notes

Never commit .env file containing actual API keys to version control
Ensure .env file is added to .gitignore
Regularly rotate API keys
Implement additional security measures for production environment

Usage Guide

Open the webpage and allow browser microphone access
Wait for "Ready" prompt
Start speaking, the system will automatically detect voice input
AI assistant will respond through voice
You can interrupt AI's response at any time to continue asking questions

Troubleshooting

If speech recognition is inaccurate, ensure:
- Microphone is working properly
- Environment noise is minimal
- Speaking at a moderate pace
If API calls fail, check:
- API keys are correct
- Network connection is stable
- API usage quotas are not exceeded

Support

For issues or suggestions, please submit an Issue or Pull Request.

License

MIT License

Acknowledgments

This project references the following excellent open-source projects:

transformers.js-examples/moonshine-web - HuggingFace's Transformers.js example project
edge-tts - Microsoft Edge TTS-based online speech synthesis service

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Backend		Backend
Frontend		Frontend
.gitignore		.gitignore
README.md		README.md
README_CN.md		README_CN.md
ui.png		ui.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI 语音助手 | AI Voice Assistant

Key Features

Requirements

Quick Start

1. Install Dependencies

2. Start Backend Service

3. Configure Environment Variables

4. Start Frontend Development Server

System Architecture

Obtaining API Keys

Baidu Speech Service

Gemini API

Security Notes

Usage Guide

Troubleshooting

Support

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lixibi/ChatWaveAI

Folders and files

Latest commit

History

Repository files navigation

AI 语音助手 | AI Voice Assistant

Key Features

Requirements

Quick Start

1. Install Dependencies

2. Start Backend Service

3. Configure Environment Variables

4. Start Frontend Development Server

System Architecture

Obtaining API Keys

Baidu Speech Service

Gemini API

Security Notes

Usage Guide

Troubleshooting

Support

License

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages