GitHub - dovvnloading/EchoLingua: EchoLingua AI delivers real-time interpretation and deep linguistic analysis through Google’s low-latency Gemini Live API. It pairs instant language bridging with a dedicated writing lab for precise text refinement—all wrapped in a mobile-optimized Thumb UI and a high-fidelity neumorphic design for intuitive, one-handed interaction.

Real-Time AI Interpretation & Linguistics Lab

Powered by Google Gemini 2.5 Flash & Gemini Live API

Overview

EchoLingua AI is a sophisticated web application engineered for real-time simultaneous interpretation and advanced linguistic analysis. By leveraging the low-latency capabilities of Google's Gemini Live API, the application bridges language barriers instantly while offering a dedicated writing lab for granular text critique.

The user experience is built upon a "Thumb UI" philosophy, anchoring critical controls to the bottom of the viewport for optimal one-handed mobile interaction, enveloped in a high-fidelity neumorphic design system that provides tactile feedback through soft shadows and highlights.

Core Features

Dual-Voice Interpreter

Simultaneous Interpretation: Connects to the Gemini Live API to process continuous audio streams, enabling instant speech translation between two selected languages.
Bi-Directional Flow: Eliminates the need for turn-taking toggles; both speakers may converse naturally while the AI handles directionality.
Live Transcription: Renders a real-time textual log of the conversation to aid visual comprehension.
High-Fidelity Audio: Processes raw PCM audio (16kHz input / 24kHz output) for broadcast-quality, low-latency performance.

Writing & Pronunciation Lab

Granular Analysis: Submits user drafts to Gemini Flash for rigorous checks on grammar, spelling, and vocabulary usage.
Schema-Enforced Feedback: Returns structured data including error explanations and International Phonetic Alphabet (IPA) transcriptions via strict JSON schema enforcement.
Neural TTS: Integrates high-quality AI voice synthesis to demonstrate correct pronunciation of analyzed text.

User Experience

Neumorphic Architecture: A visual design language utilizing realistic lighting physics to create depth and a soft, physical feel.
Mobile-First Thumb UI: Primary interaction points (microphones, language selectors, analysis triggers) are positioned within the natural reach of the user's thumb.
System-Aware Dark Mode: Fully responsive theming that adheres to system preferences or manual user overrides.

Technical Stack

Category	Technology	Details
Frontend	React 19	Built with TypeScript for type safety.
Styling	Tailwind CSS	Utility-first styling framework.
SDK	Google GenAI SDK	`@google/genai` integration.
Audio	Web Audio API	`AudioContext` & `ScriptProcessorNode` for PCM stream manipulation.

Model Implementation

Live Audio: gemini-2.5-flash-native-audio-preview-09-2025
Text Analysis: gemini-2.5-flash
Text-to-Speech: gemini-2.5-flash-preview-tts

Architecture & Audio Pipeline

EchoLingua manually orchestrates audio data to maintain strict compatibility with the Gemini Live API protocols.

Audio Input

Microphone data is captured via getUserMedia, downsampled to 16kHz, and converted into raw PCM 16-bit integer format. This stream is transmitted over WebSocket to the Live API.

Audio Output

The model response includes base64-encoded PCM data. The frontend decodes this into a Float32Array format and schedules playback via the Web Audio API's AudioBufferSourceNode, ensuring a gapless auditory experience.

State Management

Live Session: Utilizes useRef hooks to manage WebSocket connections and audio streams, preventing unnecessary React render cycles during high-frequency data transmission.
Persistence: User preferences, such as theme settings and volume levels, are serialized to localStorage.

Prerequisites

Ensure the following are installed and configured before deployment:

Node.js: Version 18 or higher.
Google Cloud Project: A project with the Gemini API enabled.
API Key: A valid Gemini API Key.

Installation

Clone the repository and install dependencies:

git clone <repository-url>
cd echolingua-ai
npm install

Environment Configuration

Create a .env file in the root directory (or configure your build tool's environment variables) to store your sensitive credentials:

# .env
API_KEY=your_google_genai_api_key

Development

Launch the local development server:

npm start

Usage Guide

Interpreter Mode

Navigate to the Interpreter tab via the bottom navigation bar.
Designate the two active languages using the bottom selectors.
Activate the Microphone control to initialize the WebSocket connection with the Live API.
Speak freely; the system will auto-detect the language and stream the translated audio response.
Deactivate the microphone to terminate the session.

Writing Lab

Navigate to the Writing Lab tab.
Select the target language from the dropdown menu.
Input text into the drafting area.
Select Review to trigger the Gemini Flash analysis.
Review the returned corrections, IPA transcriptions, and tutor notes.
Select the Speaker icon to audit the correct pronunciation via TTS.

License

This project is distributed under the MIT License. See the LICENSE file for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
echolingua-ai		echolingua-ai
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Real-Time AI Interpretation & Linguistics Lab

Overview

Core Features

Dual-Voice Interpreter

Writing & Pronunciation Lab

User Experience

Technical Stack

Model Implementation

Architecture & Audio Pipeline

Audio Input

Audio Output

State Management

Prerequisites

Installation

Environment Configuration

Development

Usage Guide

Interpreter Mode

Writing Lab

License

About

Uh oh!

Releases

Packages

Languages

License

dovvnloading/EchoLingua

Folders and files

Latest commit

History

Repository files navigation

Real-Time AI Interpretation & Linguistics Lab

Overview

Core Features

Dual-Voice Interpreter

Writing & Pronunciation Lab

User Experience

Technical Stack

Model Implementation

Architecture & Audio Pipeline

Audio Input

Audio Output

State Management

Prerequisites

Installation

Environment Configuration

Development

Usage Guide

Interpreter Mode

Writing Lab

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages