An advanced voice interaction mode for Pepper and NAO robots, featuring real-time conversations using OpenAI's GPT-4, built with React, TypeScript, and Vite.
- Real-time conversation with Pepper/NAO robot using OpenAI's or Azure OpenAI's Realtime API, optimized for low latency
- Speech processing:
- Speech-to-Text: Microsoft Azure Speech Services with broad language support and dialect recognition. Based on our experience, Azure provides superior Swiss German recognition compared to OpenAI's native speech recognition.
- Text-to-Speech: OpenAI Realtime API voice synthesis
- Multilingual support:
- Default: Swiss German
- Configurable for any language supported by Azure Speech Services
- Natural language understanding and generation
- Pepper/NAO robot integration
- Interactive features through function calling
- WebSocket-based real-time communication
The application communicates with the Pepper/NAO robot through a Python bridge:
- Web application connects to the robot via SSH
- Automatically deploys and starts a Python bridge on the robot
- Communicates with the robot through WebSocket connection
- Python bridge uses NAOqi API for robot control
- Secure SSH connection for bridge deployment
- WebSocket communication for real-time control
- Automatic reconnection handling
- Status monitoring of the robot connection
The application uses OpenAI's Realtime API, which is specifically designed for low-latency voice interactions. This enables:
- Near real-time voice responses
- Immediate speech synthesis
- Continuous conversation flow
- Interruption handling
getWeather: Get the current weather or forecast for a specific locationsearchNews: Search and summarize current news articles, focusing on Aargau by defaultsearchInternet: Search the internet for facts, information, or peoplegetRandomJoke: Retrieve a random joke from a curated collectionshowQuiz: Display a multiple-choice quiz question on the robot's tabletanalyzeCurrentView: Analyze and describe what the robot currently sees through its cameraget_current_datetime: Get the current date and timefetchArticle: Get detailed content of a specific news articlerunPepperAnimation: Execute pre-installed animations on the robot (e.g., bowing, nodding, waving)
- Node.js (v14 or higher) - Download here
- Pepper or NAO robot with network access
- Robot must be in the same network as the computer running this application
- Various API keys (see Configuration section)
- Clone the repository:
git clone https://github.com/studerus/PepperElderlyDialogue
cd PepperElderlyDialogue- Rename
.env.exampleto.envin the server directory:
mv server/.env.example server/.env- Insert your API keys in the
.envfile. You'll need to obtain them from:
You can choose between OpenAI and Azure OpenAI as your Realtime API provider by setting the API_PROVIDER parameter in the .env file:
# Use 'openai' for OpenAI or 'azure' for Azure OpenAI
API_PROVIDER=openai
When using Azure OpenAI, make sure to provide the required Azure OpenAI parameters in your .env file:
AZURE_OPENAI_KEY=your_azure_openai_key
AZURE_OPENAI_ENDPOINT=your_endpoint.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=your_deployment_name
AZURE_OPENAI_API_VERSION=2024-07-01
- Install all dependencies for both client and server:
# From the root directory
npm run install-all- Start both client and server:
# From the root directory
npm run devThe application will automatically open in your default browser at http://localhost:5173
- React 18.3
- TypeScript
- Vite
- Socket.IO
- Azure Speech Services
- TailwindCSS
- Chrome (recommended)
- Firefox
- Safari
- Edge
MIT License - Copyright (c) 2025 Erich Studerus