Skip to content

fix(ai): attach user media to Anthropic requests#1238

Merged
ling-senpeng13 merged 2 commits into
mainfrom
fix/anthropic-media-input
Jul 2, 2026
Merged

fix(ai): attach user media to Anthropic requests#1238
ling-senpeng13 merged 2 commits into
mainfrom
fix/anthropic-media-input

Conversation

@ling-senpeng13

@ling-senpeng13 ling-senpeng13 commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Problem

Images passed as input via the media path never reached Anthropic (Claude) models. The model received a text-only message and hallucinated a response instead of reading the image.

The pipeline was correct up to the provider conversion: media is stored on the ChatMessage, and LLMHelper.getMessage() downloads the URL to bytes and builds a Spring AI UserMessage with media. But AnthropicChatModel.convertMessage() built the user message from userMsg.getText() only and silently dropped userMsg.getMedia():

case USER -> {
    UserMessage userMsg = (UserMessage) msg;
    messages.add(Message.user(userMsg.getText()));   // getMedia() ignored
}

The OpenAI provider (OpenAIResponsesChatModel.convertMessage()) already forwards media as image content parts; the Anthropic converter never got that treatment. The Anthropic ContentBlock API already modeled type="image" with a base64 Source — the capability existed, it just wasn't wired up.

Fix

  • AnthropicChatModel.convertMessage(): when a UserMessage carries media, build a text block plus one image block per media item (base64-encoding the downloaded bytes), mirroring the OpenAI path. Text-only messages are unchanged.
  • AnthropicMessagesApi: add a ContentBlock.image(mediaType, base64Data) factory.

Test

Adds AnthropicChatModelMediaTest, which mocks the Messages API, captures the outgoing MessagesRequest, and asserts the user message serializes to a content-block list containing an image block with source.type=base64, media_type=image/png, and the verbatim base64 payload — plus the accompanying text block.

Validated both directions:

  • Before the fix: FAILS — user content serializes to a bare String (media dropped).
  • After the fix: PASSES.
./gradlew :conductor-ai:test --tests "org.conductoross.conductor.ai.providers.anthropic.AnthropicChatModel*"
BUILD SUCCESSFUL

Scope

Anthropic provider only; OpenAI/Gemini media paths are untouched. convertMessage is the single message-building path for Anthropic (used by the normal call path), so all Anthropic requests are covered.

End-to-end validation (full agentspan SDK e2e suite)

Built a server carrying this fix (the 0.3.0 agentspan server baseline + this exact conductor-ai change) and ran the full agentspan Python SDK e2e suite against it.

Result: 123 passed, 8 failed, 19 skipped/xfailed (150 tests, 23m).

The media-input suite (Suite 25) passes in full, including the Anthropic case — which is the direct target of this fix and was previously failing/skipped:

test_vision_reads_text_from_image[openai]       PASSED
test_vision_reads_text_from_image[anthropic]    PASSED   ← fixed by this PR (was: hallucinated, no image)
test_without_media_token_is_absent[openai]      PASSED
test_without_media_token_is_absent[anthropic]   PASSED

A direct probe confirms it: anthropic/claude-sonnet-4-5 + an image of the text MELON7391 now returns MELON7391 (before the fix the model received no image and hallucinated unrelated text).

The 8 failures are unrelated to this PR — all are a pre-existing version mismatch between the newer standalone agentspan Go CLI and the older 0.3.0 server's REST API (not the Anthropic provider):

  • TestSuite16CliSkills (5) — CLI skill --version flag unknown / skill load HTTP 400 against the 0.3.0 server.
  • TestSuite2ToolCalling, TestSuite4McpTools, TestSuite5HttpTools (1 each) — credentials delete → HTTP 404 No static resource api/credentials/... (endpoint absent on the 0.3.0 server).

None touch the LLM provider path; they fail identically with or without this change.

Exact-version confirmation: the same result was reproduced building conductor-ai from the v3.32.0-rc.3 tag with this fix cherry-picked (not just the 3.30.2 baseline) and running it inside the matching agentspan server — AnthropicChatModelMediaTest passes at that source, and Suite 25 passes end-to-end (4 passed, Anthropic included).

🤖 Generated with Claude Code

@ling-senpeng13 ling-senpeng13 force-pushed the fix/anthropic-media-input branch from 0383755 to 3cec6fa Compare July 1, 2026 19:21
AnthropicChatModel.convertMessage() built the USER message from
userMsg.getText() only and silently dropped userMsg.getMedia(), so
images passed via the media input path never reached Claude — the model
received a text-only message and hallucinated. The OpenAI provider
already forwards media; Anthropic did not.

Convert user media into Anthropic image content blocks (base64 source)
alongside the text, mirroring OpenAIResponsesChatModel. Adds a
ContentBlock.image(mediaType, base64Data) factory (the image block type
and Source were already modeled, just unused).

Adds AnthropicChatModelMediaTest, which captures the outgoing
MessagesRequest and asserts the user message carries an image content
block with the verbatim base64 payload. Verified it fails before the fix
(media dropped -> bare string content) and passes after.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ling-senpeng13 ling-senpeng13 force-pushed the fix/anthropic-media-input branch from 3cec6fa to 00ab8b3 Compare July 1, 2026 19:31
@ling-senpeng13 ling-senpeng13 requested a review from v1r3n July 1, 2026 19:57
@ling-senpeng13 ling-senpeng13 self-assigned this Jul 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants