Skip to content

feat(googleai): Add comprehensive model documentation and Veo 3 support #3162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
124 changes: 118 additions & 6 deletions js/plugins/googleai/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,133 @@ npm i --save @genkit-ai/googleai

## Using the plugin

### Basic Setup

```ts
import { genkit } from 'genkit';
import { googleAI, gemini } from '@genkit-ai/googleai';
import { googleAI } from '@genkit-ai/googleai';

const ai = genkit({
plugins: [googleAI()],
model: gemini('gemini-1.5-flash'),
model: googleAI.model('gemini-2.0-flash'),
});
```

### Text Generation

```ts
// Simple text generation
const { text } = await ai.generate('Explain quantum computing');
console.log(text);

// With a specific model
const response = await ai.generate({
model: googleAI.model('gemini-1.5-pro'),
prompt: 'Write a haiku about coding',
});
```

### Multimodal Generation

```ts
// With images
const response = await ai.generate({
model: googleAI.model('gemini-2.0-flash'),
prompt: [
{ text: 'What is in this image?' },
{ media: { url: 'data:image/jpeg;base64,...' } }
],
});
```

async () => {
const { text } = ai.generate('hi Gemini!');
console.log(text);
};
### Image Generation

```ts
// Using Imagen
const imageResponse = await ai.generate({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to use

const { media } = await ai.generate(...)

See:

const { media } = await ai.generate({

model: googleAI.model('imagen-3.0-generate-002'),
prompt: 'A serene mountain landscape at sunset',
});

// Using Imagen 4 (Preview)
const imagen4Response = await ai.generate({
model: googleAI.model('imagen-4.0-generate-preview-06-06'),
prompt: 'A futuristic city with flying cars',
});
```

### Video Generation

```ts
// Using Veo 2
const videoResponse = await ai.generate({
Copy link
Collaborator

@pavelgj pavelgj Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not quite right/complete. Refer to

let { operation } = await ai.generate({

More specifically, you need to get the operation:

const { operation } = await ai.generate(...)

and then check the operation status:

  while (!operation.done) {
    sendChunk('check status of operation ' + operation.id);
    operation = await ai.checkOperation(operation);
    await new Promise((resolve) => setTimeout(resolve, 5000));
  }

and then download the video...

async function downloadVideo(video: MediaPart, path: string) {

model: googleAI.model('veo-2.0-generate-001'),
prompt: 'A time-lapse of clouds moving over a city skyline',
config: {
aspectRatio: '16:9',
durationSeconds: 8,
}
});

// Using Veo 3 (if available)
const veo3Response = await ai.generate({
model: googleAI.model('veo-3.0-generate-003'),
prompt: 'Ocean waves crashing on a beach at sunset',
config: {
aspectRatio: '9:16',
durationSeconds: 5,
}
});
```

### Text-to-Speech

```ts
// Using Gemini TTS
const audioResponse = await ai.generate({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to example here:

const { media } = await ai.generate({

const { media } = await ai.generate(...)

but then it requires conversion from PCM to WAV format to be usable:

async function toWav(

model: googleAI.model('gemini-2.5-flash-preview-tts'),
prompt: 'Hello, welcome to our presentation.',
});
```

### Native Audio (Conversational)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support this....

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI thinks we should...


```ts
// Using native audio models
const audioDialogResponse = await ai.generate({
model: googleAI.model('gemini-2.5-flash-preview-native-audio-dialog'),
prompt: 'Tell me a story about a brave knight',
});
```

### Embeddings

```ts
// Text embeddings
const embedding = await ai.embed({
embedder: googleAI.embedder('gemini-embedding-exp'),
content: 'The quick brown fox jumps over the lazy dog',
});
```

### Using Fine-tuned Models

```ts
// Use your fine-tuned model
const response = await ai.generate({
model: googleAI.model('tunedModels/your-model-id'),
prompt: 'Your prompt here',
});
```

## Supported Models

For a comprehensive list of all supported models with their capabilities and specifications, see [SUPPORTED_MODELS.md](./SUPPORTED_MODELS.md).

The plugin uses dynamic model discovery, so new models released through the Gemini API are often supported automatically without requiring plugin updates.

## Documentation

The sources for this package are in the main [Genkit](https://github.com/firebase/genkit) repo. Please file issues and pull requests against that repo.

Usage information and reference details can be found in [Genkit documentation](https://genkit.dev/docs/plugins/google-genai/).
Expand Down
48 changes: 48 additions & 0 deletions js/plugins/googleai/SUPPORTED_MODELS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Supported Models - Google AI Plugin

The `@genkit-ai/googleai` plugin connects to the Gemini API and is designed to be highly flexible. Because of this, Genkit supports nearly any generative or embedding model available through the API, including new and fine-tuned models, often without needing a plugin update.

The following table lists many of the available models to help you get started. However, as the Gemini API evolves rapidly, this list may not be exhaustive. **For the most current and complete list of models, always refer to the official [Google AI Models documentation](https://ai.google.dev/gemini-api/docs/models).**

## Text, Multimodal, and Live Models

| Model Name | Code Reference | Capabilities | Notes |
| :--- | :--- | :--- | :--- |
| **Gemini 2.5 Pro** | `googleAI.model('gemini-2.5-pro')` | Text, Vision, Audio, PDF | Enhanced thinking and reasoning. |
| **Gemini 2.5 Flash** | `googleAI.model('gemini-2.5-flash')` | Text, Vision, Audio | Fast and versatile. |
| **Gemini 2.5 Flash-Lite** | `googleAI.model('gemini-2.5-flash-lite-preview-06-17')` | Text, Vision, Audio | Cost-efficient, high throughput. | `Preview` |
| **Gemini 2.5 Flash Live** | `googleAI.model('gemini-live-2.5-flash-preview')` | Bidirectional Voice & Video | For low-latency interactive sessions. | `Preview` |
| **Gemini 2.0 Flash** | `googleAI.model('gemini-2.0-flash')` | Text, Vision, Audio | Next-gen features and speed. |
| **Gemini 2.0 Flash Image Gen** | `googleAI.model('gemini-2.0-flash-preview-image-generation')` | Text, Vision, Image Generation | Conversational image generation. | `Preview` |
| **Gemini 2.0 Flash-Lite** | `googleAI.model('gemini-2.0-flash-lite')` | Text, Vision, Audio | Cost-efficient and low latency. |
| **Gemini 2.0 Flash Live** | `googleAI.model('gemini-2.0-flash-live-001')` | Bidirectional Voice & Video | For low-latency interactive sessions. |
| **Gemini 1.5 Pro** | `googleAI.model('gemini-1.5-pro')` | Text, Vision, Audio | Complex reasoning tasks. |
| **Gemini 1.5 Flash** | `googleAI.model('gemini-1.5-flash')` | Text, Vision, Audio | Fast performance for diverse tasks. |
| **Gemini 1.5 Flash-8B** | `googleAI.model('gemini-1.5-flash-8b')` | Text, Vision, Audio | High-volume, lower intelligence tasks. |

## Specialized Models (Audio, Image, Video, Embeddings)

| Model Type | Model Name | Code Reference | Capabilities | Notes |
| :--- | :--- | :--- | :--- | :--- |
| **Native Audio** | Gemini 2.5 Flash Native Audio (Dialog) | `googleAI.model('gemini-2.5-flash-preview-native-audio-dialog')` | Interleaved Text & Audio | Natural conversational audio. | `Preview` |
| **Native Audio** | Gemini 2.5 Flash Native Audio (Thinking) | `googleAI.model('gemini-2.5-flash-exp-native-audio-thinking-dialog')` | Interleaved Text & Audio | Includes thinking audio cues. | `Experimental` |
| **Text-to-Speech** | Gemini 2.5 Flash TTS | `googleAI.model('gemini-2.5-flash-preview-tts')` | Text-to-Speech | Low-latency audio generation. | `Preview` |
| **Text-to-Speech** | Gemini 2.5 Pro TTS | `googleAI.model('gemini-2.5-pro-preview-tts')` | Text-to-Speech | High-quality audio generation. | `Preview` |
| **Image Gen** | Imagen 4 | `googleAI.model('imagen-4.0-generate-preview-06-06')` | Image Generation | Latest image generation. | `Preview` |
| **Image Gen** | Imagen 4 Ultra | `googleAI.model('imagen-4.0-ultra-generate-preview-06-06')` | Image Generation | Highest quality image generation. | `Preview` |
| **Image Gen** | Imagen 3 | `googleAI.model('imagen-3.0-generate-002')` | Image Generation | High-quality images. |
| **Video Gen** | Veo 2 | `googleAI.model('veo-2.0-generate-001')` | Video Generation | High-quality video generation. |
| **Video Gen** | Veo 3 | `googleAI.model('veo-3.0-generate-003')` | Video Generation | Latest video generation model. | `If available` |
| **Embedding** | Gemini Embedding | `googleAI.embedder('gemini-embedding-exp')` | Text Embedding | Measures relatedness of text. | `Experimental` |

## Notes

- Models marked as `Preview` or `Experimental` may have limited availability or be subject to changes.
- Video generation models (Veo) return long-running operations that need to be polled for completion.
- Some models have specific configuration options. Check the model's documentation for details.
- The plugin uses dynamic model discovery, so new models are often supported automatically without plugin updates.
- You can also use fine-tuned models by passing their ID: `googleAI.model('tunedModels/your-model-id')`

For usage examples and code samples, see the [README](./README.md).

For the most up-to-date information about model capabilities, limitations, and pricing, please refer to the [official Google AI documentation](https://ai.google.dev/gemini-api/docs/models).
2 changes: 1 addition & 1 deletion js/plugins/googleai/src/veo.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ import {
import { getApiKeyFromEnvVar } from './common.js';
import { Operation as ApiOperation, checkOp, predictModel } from './predict.js';

export type KNOWN_VEO_MODELS = 'veo-2.0-generate-001';
export type KNOWN_VEO_MODELS = 'veo-2.0-generate-001' | 'veo-3.0-generate-003';

/**
* See https://ai.google.dev/gemini-api/docs/video
Expand Down
Loading