Skip to content

fix(ai): attach user media to Gemini requests#1241

Open
ling-senpeng13 wants to merge 2 commits into
mainfrom
fix/gemini-media-input
Open

fix(ai): attach user media to Gemini requests#1241
ling-senpeng13 wants to merge 2 commits into
mainfrom
fix/gemini-media-input

Conversation

@ling-senpeng13

Copy link
Copy Markdown
Contributor

Problem

Images passed as input via the media path never reached Gemini models — the model received a text-only message. Same bug as #1238 (Anthropic), in the Gemini converter:

case USER -> {
    UserMessage userMsg = (UserMessage) msg;
    contents.add(new GeminiApi.Content(
        "user", List.of(GeminiApi.Part.text(userMsg.getText()))));   // getMedia() dropped
}

LLMHelper.getMessage() builds a Spring AI UserMessage with media (bytes), but GeminiChatModel.convertMessage() only forwarded getText().

Fix

  • GeminiChatModel.convertMessage(): when a UserMessage carries media, build text + one inline-data Part per media item (base64), mirroring the OpenAI path and the fix(ai): attach user media to Anthropic requests #1238 Anthropic fix. Text-only messages are unchanged.
  • GeminiApi: add a Part.inlineData(mimeType, base64Data) factory (the InlineData record already existed; it just had no request-side factory).

Test

Adds GeminiChatModelMediaTest, which mocks the API, captures the contents passed to generateContent, and asserts the user content carries an inline image part (inlineData) with mimeType=image/png and the verbatim base64 payload — plus the accompanying text part.

Validated both directions:

  • Before the fix: FAILS — "inline image part missing — media was dropped".
  • After the fix: PASSES.
./gradlew :conductor-ai:test --tests "org.conductoross.conductor.ai.providers.gemini.GeminiChatModel*"
BUILD SUCCESSFUL

Scope

Gemini provider only; other providers untouched. convertMessage is the single message-building path for Gemini, so all Gemini requests are covered. Companion to #1238 (Anthropic) — the two custom converters that dropped media.

🤖 Generated with Claude Code

GeminiChatModel.convertMessage() built the USER content from
userMsg.getText() only and silently dropped userMsg.getMedia(), so
images passed via the media input path never reached Gemini — the model
received a text-only message. Same bug (and same shape of fix) as
conductor-oss#1238 for Anthropic.

Convert user media into Gemini inline data parts (base64) alongside the
text. Adds a GeminiApi.Part.inlineData(mimeType, base64Data) factory (the
InlineData record already existed, just had no request-side factory).

Adds GeminiChatModelMediaTest, which captures the contents passed to
generateContent and asserts the user message carries an inline image part
with the verbatim base64 payload. Verified it fails before the fix
(inline part missing) and passes after.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Fixes spotlessJavaCheck CI failure by wrapping lines per Google Java Format.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant