demo2.1.mp4
This project showcases Gemini 2.5 real-time multimodal AI capabilities in a web application using Angular. Currently the Live API is only available for Gemini 2.5 Flash Live
.
This project demonstrates integration with Google's Gemini AI models through the @google/genai
library now in (Technical) Preview.
This project started as a migration to Angular of the Live API - Web console as is only available in React at the moment.
[8th July]
- Added MCP support. Integrated Model Context Protocol SDK with access to two servers: weather and multiplication.
- Function calling is not available for native audio. Make sure the
affective
andproactive
flags are disabled. To use you can try prompts likeWhat's the temperature in Barcelona?
orMultiply 2 by 2
. You can inspect thetool call
andtool responses
by expanding the left side panel.
[7th July]
- New model:
Gemini 2.5 Flash Live
replacesGemini 2.0 Flash Live
. - Native audio: 30 voices, 24 languages, accents and voice effects (whispering, laughing). Tool usage is limited to function calling and search.
- Live configuration options:
- Native audio: affective dialog and proactive audio options.
- Cascade audio: new language support.
- Previous models are referred as half-cascade or cascade audio:
gemini-live-2.5-flash-preview
andgemini-2.0-flash-live-001
. As opposed to new native audio models, these models go through a two step process: native audio input and text-to-speech output. All tool usage options are available. More details about how to choose your audio architecture here.
[10th April]
- New model:
Gemini 2.0 Flash Live
replacesGemini 2.0 Flash Experimental
. - 3 more voices: Leda, Orus, and Zephyr.
- Live configuration options:
- Setup automatic context window compression via
config.contextWindowCompression
. - Adjust Gemini's voice quality output: 16kHz (low) and 24kHz (medium) via
config.generationConfig.mediaResolution
.
- Setup automatic context window compression via
[26th March]
- Enable transcripts for both user and Gemini via a third party API (DeepGram).
- Starter kit based on Live API - Web console
- TypeScript GenAI SDK for Gemini 2.5 API
- MCP support: Typescript MCP SDK
- Real-time streaming voice from and to Gemini 2.5 Live API
- Real-time streaming video from webcam or screen to Gemini 2.5 Live API
- Support for both native and cascade audio models
- Natural language text generation
- Interactive chat functionality
- Google Search integration for current information
- Secure Python code execution in sandbox
- Automated function calling for API integration
- Live transcription for streamed audio (user and model) via Deepgram API (optional)
Gemini Live API enables a new generation of dynamic, multimodal AI real-time experiences.
Gemini Live powers innovative applications across devices and platforms:
- Hands-free AI Assistance: Users interact naturally through voice while cooking, driving, or multitasking
- Real-time Visual Understanding: Get instant AI responses as you show objects, documents, or scenes through your camera
- Smart Home Automation: Control your environment with natural voice commands - from adjusting lights to managing thermostats
- Seamless Shopping: Browse products, compare options, and complete purchases through conversation
- Live Problem Solving: Share your screen to get real-time guidance, troubleshooting, or explanations
- Integration with Google services: leverage existing Google services like Search or Maps to enhance its capabilities
Project Astra is a research initiative aimed at developing a universal AI assistant with advanced capabilities. It's designed to process multimodal information, including text, speech, images, and video, allowing for a more comprehensive understanding of user needs and context.
- Node.js and npm (latest stable version)
- Angular CLI (globally installed via
npm install -g @angular/cli
) - Google AI API key from Google AI Studio
- Deepgram API key from Deepgram (optional)
Note that currently
Gemini 2.5 Flash Live
only sends transcript information when using Vertex AI. You can use Deepgram to transcribe both the user's audio and the model's audio from a Web Client if needed. To enable it just create an Api Key and add it to the development environment.
-
Set Up Environment Variables
ng g environments
Create
environment.development.ts
insrc/environments/
with:export const environment = { API_KEY: 'YOUR_GOOGLE_AI_API_KEY', DEEPGRAM_API_KEY: 'YOUR_DEEPGRAM_API_KEY', // optional };
-
Install Dependencies
npm install
- Launch the application and click the
Connect
button underConnection Status
- The demo uses Gemini 2.5 Live API which requires a WebSocket connection
- Monitor the browser's Developer Tools Console for connection issues
- Before diving into development, explore Gemini 2.5's Live capabilities (voice interactions, webcam, and screen sharing) using Google AI Studio Live. This interactive playground will help you understand the available features and integration options before implementing them in your project.
Test the various capabilities using these example prompts:
-
Google Search Integration
- "Tell me the scores for the last 3 games of FC Barcelona."
-
Code Execution
- "What's the 50th prime number?"
- "What's the square root of 342.12?"
-
Function Calling
- "What's the weather in London?" (Note: Currently returns mock data of 25 degrees)
The main configuration is handled in src/app.component
. You can toggle between audio and text modalities:
let config: LiveConnectConfig = {
// For text responses in chat window
responseModalities: [Modality.TEXT], // note "audio" doesn't send a text response over
// For audio responses (uncomment to enable)
// responseModalities: [Modality.AUDIO],
// speechConfig: {
// voiceConfig: { prebuiltVoiceConfig: { voiceName: "Aoede" } },
// },
}
- Daily and session-based limits apply
- Token count restrictions to prevent abuse
- If limits are exceeded, wait until the next day to resume
Start the development server:
ng serve
Access the application at http://localhost:4200/
-
Generate New Components
ng generate component component-name
-
Build Project
ng build
Build artifacts will be stored in the
dist/
directory -
Run Tests
- Unit Tests:
ng test
- E2E Tests:
Note: Select and install your preferred E2E testing framework
ng e2e
- Unit Tests:
- Built with Angular CLI version 20.0.4
- Logging state management including Dev Tools with NgRx version 19.0.1
- TypeScript GenAI SDK version 1.18.0
- Typescript SDK for Model Context Protocol version 1.15.0
- Features automatic reload during development
- Includes production build optimizations
- Angular CLI Documentation
- Google AI Studio
- Browser Developer Tools for debugging