Skip to content

studerus/PepperElderlyDialogue

Repository files navigation

Pepper/NAO Robot Advanced Voice Mode

An advanced voice interaction mode for Pepper and NAO robots, featuring real-time conversations using OpenAI's GPT-4, built with React, TypeScript, and Vite.

Features

  • Real-time conversation with Pepper/NAO robot using OpenAI's or Azure OpenAI's Realtime API, optimized for low latency
  • Speech processing:
    • Speech-to-Text: Microsoft Azure Speech Services with broad language support and dialect recognition. Based on our experience, Azure provides superior Swiss German recognition compared to OpenAI's native speech recognition.
    • Text-to-Speech: OpenAI Realtime API voice synthesis
  • Multilingual support:
    • Default: Swiss German
    • Configurable for any language supported by Azure Speech Services
    • Natural language understanding and generation
  • Pepper/NAO robot integration
  • Interactive features through function calling
  • WebSocket-based real-time communication

Pepper/NAO Integration

The application communicates with the Pepper/NAO robot through a Python bridge:

Architecture

  • Web application connects to the robot via SSH
  • Automatically deploys and starts a Python bridge on the robot
  • Communicates with the robot through WebSocket connection

Technical Implementation

  • Python bridge uses NAOqi API for robot control
  • Secure SSH connection for bridge deployment
  • WebSocket communication for real-time control
  • Automatic reconnection handling
  • Status monitoring of the robot connection

Technical Details

OpenAI Realtime API

The application uses OpenAI's Realtime API, which is specifically designed for low-latency voice interactions. This enables:

  • Near real-time voice responses
  • Immediate speech synthesis
  • Continuous conversation flow
  • Interruption handling

Available Function Calls

  • getWeather: Get the current weather or forecast for a specific location
  • searchNews: Search and summarize current news articles, focusing on Aargau by default
  • searchInternet: Search the internet for facts, information, or people
  • getRandomJoke: Retrieve a random joke from a curated collection
  • showQuiz: Display a multiple-choice quiz question on the robot's tablet
  • analyzeCurrentView: Analyze and describe what the robot currently sees through its camera
  • get_current_datetime: Get the current date and time
  • fetchArticle: Get detailed content of a specific news article
  • runPepperAnimation: Execute pre-installed animations on the robot (e.g., bowing, nodding, waving)

Getting Started

Prerequisites

  • Node.js (v14 or higher) - Download here
  • Pepper or NAO robot with network access
    • Robot must be in the same network as the computer running this application
  • Various API keys (see Configuration section)

Installation

  1. Clone the repository:
git clone https://github.com/studerus/PepperElderlyDialogue 
cd PepperElderlyDialogue
  1. Rename .env.example to .env in the server directory:
mv server/.env.example server/.env
  1. Insert your API keys in the .env file. You'll need to obtain them from:

API Provider Configuration

You can choose between OpenAI and Azure OpenAI as your Realtime API provider by setting the API_PROVIDER parameter in the .env file:

# Use 'openai' for OpenAI or 'azure' for Azure OpenAI
API_PROVIDER=openai

When using Azure OpenAI, make sure to provide the required Azure OpenAI parameters in your .env file:

AZURE_OPENAI_KEY=your_azure_openai_key
AZURE_OPENAI_ENDPOINT=your_endpoint.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=your_deployment_name
AZURE_OPENAI_API_VERSION=2024-07-01
  1. Install all dependencies for both client and server:
# From the root directory
npm run install-all
  1. Start both client and server:
# From the root directory
npm run dev

The application will automatically open in your default browser at http://localhost:5173

Tech Stack

  • React 18.3
  • TypeScript
  • Vite
  • Socket.IO
  • Azure Speech Services
  • TailwindCSS

Browser Support

  • Chrome (recommended)
  • Firefox
  • Safari
  • Edge

License

MIT License - Copyright (c) 2025 Erich Studerus

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published