The new Gemini Embedding model gemini-embedding-2-preview has been released:
https://ai.google.dev/gemini-api/docs/models/gemini-embedding-2-preview
It supports embedding additional document types, not just Text but also image, video, audio, and PDF.
This opens opportunities for Vault Intelligence to use these documents in our RAG Graphs, which could be valuable to many users. It presents some engineering issues around our 'snippets' preview and for our embedding chunking algorithm.
The biggest win would likely be if we can research and develop a robust way to chunk PDF files. These (1) are common and (2) can be large. There must be libraries to split large PDFs into pages or smaller? I think a reasonable solution would be to chunk PDFs by page, perhaps with a user setting to make it n pages, n >= 1.
Chunking video and audio would likely require heavy-weight libraries that may not belong in a light-weight Obsidian plugin. Note the hard-coded limits documented at https://ai.google.dev/gemini-api/docs/embeddings#supported-modalities. Perhaps we could look at a way to call external functions, services, or MCP agents for this, but this feels like over-engineering for a small gain.
For images, I think we just rely on the larger context window (see below).
So I propose we add support for (1) PDF and (2) Image.
Note that this likely requires us to use the Files API to overcome the standard REST payload limits (typically 20MB or so).
2. Larger context window
Separate but related:
Note that this model have also increased context window to 8192 tokens from 2048 in the previous models. We should update our code to allow users to take advantage of this. (But note that we need to research and likely provide some guidance on selecting the right balance between embedding large chunks for extra context but at the cost of consuming the model context window faster, versus smaller, more focused chunks.)
The larger context will likely have to suffice for image embeddings -- I can't see how we can chunk those.
The new Gemini Embedding model
gemini-embedding-2-previewhas been released:https://ai.google.dev/gemini-api/docs/models/gemini-embedding-2-preview
It supports embedding additional document types, not just Text but also image, video, audio, and PDF.
This opens opportunities for Vault Intelligence to use these documents in our RAG Graphs, which could be valuable to many users. It presents some engineering issues around our 'snippets' preview and for our embedding chunking algorithm.
The biggest win would likely be if we can research and develop a robust way to chunk PDF files. These (1) are common and (2) can be large. There must be libraries to split large PDFs into pages or smaller? I think a reasonable solution would be to chunk PDFs by page, perhaps with a user setting to make it n pages, n >= 1.
Chunking video and audio would likely require heavy-weight libraries that may not belong in a light-weight Obsidian plugin. Note the hard-coded limits documented at https://ai.google.dev/gemini-api/docs/embeddings#supported-modalities. Perhaps we could look at a way to call external functions, services, or MCP agents for this, but this feels like over-engineering for a small gain.
For images, I think we just rely on the larger context window (see below).
So I propose we add support for (1) PDF and (2) Image.
Note that this likely requires us to use the Files API to overcome the standard REST payload limits (typically 20MB or so).
2. Larger context window
Separate but related:
Note that this model have also increased context window to 8192 tokens from 2048 in the previous models. We should update our code to allow users to take advantage of this. (But note that we need to research and likely provide some guidance on selecting the right balance between embedding large chunks for extra context but at the cost of consuming the model context window faster, versus smaller, more focused chunks.)
The larger context will likely have to suffice for image embeddings -- I can't see how we can chunk those.