Skip to content

Support additional document types through new Gemini Embeddings 2 #340

@cybaea

Description

@cybaea

The new Gemini Embedding model gemini-embedding-2-preview has been released:

https://ai.google.dev/gemini-api/docs/models/gemini-embedding-2-preview

It supports embedding additional document types, not just Text but also image, video, audio, and PDF.

This opens opportunities for Vault Intelligence to use these documents in our RAG Graphs, which could be valuable to many users. It presents some engineering issues around our 'snippets' preview and for our embedding chunking algorithm.

The biggest win would likely be if we can research and develop a robust way to chunk PDF files. These (1) are common and (2) can be large. There must be libraries to split large PDFs into pages or smaller? I think a reasonable solution would be to chunk PDFs by page, perhaps with a user setting to make it n pages, n >= 1.

Chunking video and audio would likely require heavy-weight libraries that may not belong in a light-weight Obsidian plugin. Note the hard-coded limits documented at https://ai.google.dev/gemini-api/docs/embeddings#supported-modalities. Perhaps we could look at a way to call external functions, services, or MCP agents for this, but this feels like over-engineering for a small gain.

For images, I think we just rely on the larger context window (see below).

So I propose we add support for (1) PDF and (2) Image.

Note that this likely requires us to use the Files API to overcome the standard REST payload limits (typically 20MB or so).

2. Larger context window

Separate but related:

Note that this model have also increased context window to 8192 tokens from 2048 in the previous models. We should update our code to allow users to take advantage of this. (But note that we need to research and likely provide some guidance on selecting the right balance between embedding large chunks for extra context but at the cost of consuming the model context window faster, versus smaller, more focused chunks.)

The larger context will likely have to suffice for image embeddings -- I can't see how we can chunk those.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: backendIssues affecting AI models, API calls, or core logic.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions