EchoLingua AI is a sophisticated web application engineered for real-time simultaneous interpretation and advanced linguistic analysis. By leveraging the low-latency capabilities of Google's Gemini Live API, the application bridges language barriers instantly while offering a dedicated writing lab for granular text critique.
The user experience is built upon a "Thumb UI" philosophy, anchoring critical controls to the bottom of the viewport for optimal one-handed mobile interaction, enveloped in a high-fidelity neumorphic design system that provides tactile feedback through soft shadows and highlights.
- Simultaneous Interpretation: Connects to the Gemini Live API to process continuous audio streams, enabling instant speech translation between two selected languages.
- Bi-Directional Flow: Eliminates the need for turn-taking toggles; both speakers may converse naturally while the AI handles directionality.
- Live Transcription: Renders a real-time textual log of the conversation to aid visual comprehension.
- High-Fidelity Audio: Processes raw PCM audio (16kHz input / 24kHz output) for broadcast-quality, low-latency performance.
- Granular Analysis: Submits user drafts to Gemini Flash for rigorous checks on grammar, spelling, and vocabulary usage.
- Schema-Enforced Feedback: Returns structured data including error explanations and International Phonetic Alphabet (IPA) transcriptions via strict JSON schema enforcement.
- Neural TTS: Integrates high-quality AI voice synthesis to demonstrate correct pronunciation of analyzed text.
- Neumorphic Architecture: A visual design language utilizing realistic lighting physics to create depth and a soft, physical feel.
- Mobile-First Thumb UI: Primary interaction points (microphones, language selectors, analysis triggers) are positioned within the natural reach of the user's thumb.
- System-Aware Dark Mode: Fully responsive theming that adheres to system preferences or manual user overrides.
| Category | Technology | Details |
|---|---|---|
| Frontend | React 19 | Built with TypeScript for type safety. |
| Styling | Tailwind CSS | Utility-first styling framework. |
| SDK | Google GenAI SDK | @google/genai integration. |
| Audio | Web Audio API | AudioContext & ScriptProcessorNode for PCM stream manipulation. |
- Live Audio:
gemini-2.5-flash-native-audio-preview-09-2025 - Text Analysis:
gemini-2.5-flash - Text-to-Speech:
gemini-2.5-flash-preview-tts
EchoLingua manually orchestrates audio data to maintain strict compatibility with the Gemini Live API protocols.
Microphone data is captured via getUserMedia, downsampled to 16kHz, and converted into raw PCM 16-bit integer format. This stream is transmitted over WebSocket to the Live API.
The model response includes base64-encoded PCM data. The frontend decodes this into a Float32Array format and schedules playback via the Web Audio API's AudioBufferSourceNode, ensuring a gapless auditory experience.
- Live Session: Utilizes
useRefhooks to manage WebSocket connections and audio streams, preventing unnecessary React render cycles during high-frequency data transmission. - Persistence: User preferences, such as theme settings and volume levels, are serialized to
localStorage.
Ensure the following are installed and configured before deployment:
- Node.js: Version 18 or higher.
- Google Cloud Project: A project with the Gemini API enabled.
- API Key: A valid Gemini API Key.
Clone the repository and install dependencies:
git clone <repository-url>
cd echolingua-ai
npm installCreate a .env file in the root directory (or configure your build tool's environment variables) to store your sensitive credentials:
# .env
API_KEY=your_google_genai_api_keyLaunch the local development server:
npm start- Navigate to the Interpreter tab via the bottom navigation bar.
- Designate the two active languages using the bottom selectors.
- Activate the Microphone control to initialize the WebSocket connection with the Live API.
- Speak freely; the system will auto-detect the language and stream the translated audio response.
- Deactivate the microphone to terminate the session.
- Navigate to the Writing Lab tab.
- Select the target language from the dropdown menu.
- Input text into the drafting area.
- Select Review to trigger the Gemini Flash analysis.
- Review the returned corrections, IPA transcriptions, and tutor notes.
- Select the Speaker icon to audit the correct pronunciation via TTS.
This project is distributed under the MIT License. See the LICENSE file for more information.