Skip to content

feat: multimodal support in AmazonBedrockChatGenerator #307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
May 23, 2025

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented May 21, 2025

Related Issues

Proposed Changes:

Add an experimental version of AmazonBedrockChatGenerator, which can handle user messages with text + images.

  • It only alters the _format_messages utility function
  • Reuses the existing implementation where possible

How did you test it?

Copied the existing tests; added new unit tests (for the utility function) + an integration test

Notes for the reviewer

  • Don't get scared by the size of this PR: the core logic is in haystack_experimental/components/generators/chat/bedrock.py and is about 150 lines.
  • Let's discuss integration tests: I copied all of them to show that everything works as before, but we can also agree to remove some of them (or test with fewer models) to save time and money. Let me know what you think...

Checklist

@coveralls
Copy link

coveralls commented May 21, 2025

Pull Request Test Coverage Report for Build 15203979000

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.4%) to 73.708%

Files with Coverage Reduction New Missed Lines %
components/generators/chat/init.py 2 75.0%
Totals Coverage Status
Change from base Build 15203742899: 0.4%
Covered Lines: 1326
Relevant Lines: 1799

💛 - Coveralls

@anakin87 anakin87 changed the title feat: multimodal Bedrock [WIP] feat: multimodal support in AmazonBedrockChatGenerator May 22, 2025
@anakin87 anakin87 marked this pull request as ready for review May 22, 2025 09:59
@anakin87 anakin87 requested a review from a team as a code owner May 22, 2025 09:59
@anakin87 anakin87 requested review from sjrl and removed request for a team May 22, 2025 09:59
@sjrl
Copy link
Contributor

sjrl commented May 22, 2025

I think it'd be good to be consistent with naming and change the name of the file from bedrock.py to amazon_bedrock.py but also fine if you'd rather leave alone.

elif msg.tool_calls:
bedrock_formatted_messages.append(_format_tool_call_message(msg))
elif msg.tool_call_results:
bedrock_formatted_messages.append(_format_tool_result_message(msg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think out of scope in this PR, but do you happen to know if bedrock supports images in the tool result message? Thinking again on the request of a Tool being able to return an image.

Copy link
Member Author

@anakin87 anakin87 May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly yes: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ToolResultContentBlock.html

This field is only supported by Anthropic Claude 3 models.

This is a bit strange because Claude should not allow images in tool results according to their API. I was wrong.

Anyway, I want to see if this works with the Converse API. I'll keep you posted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@sjrl sjrl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few minor comments.

@anakin87
Copy link
Member Author

I think it'd be good to be consistent with naming and change the name of the file from bedrock.py to amazon_bedrock.py but also fine if you'd rather leave alone.

done

@anakin87
Copy link
Member Author

anakin87 commented May 23, 2025

Converse API - Tool result with an image

import os
from haystack_integrations.common.amazon_bedrock.utils import get_aws_session


session = get_aws_session(
    aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
    aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
    aws_session_token=os.getenv("AWS_SESSION_TOKEN"),
    aws_region_name="us-east-1",
    aws_profile_name=os.getenv("AWS_PROFILE"),
)

client = session.client("bedrock-runtime")

image_bytes = open("test/test_files/images/apple.jpg", "rb").read()

messages = [
    {
        "role": "user",
        "content": [{"text": "Download the image at this url and describe it in max 5 words. URL: www.example.com/image.png"}]
    },
    {
        "role": "assistant",
        "content": [
            {"text": "I need to use the download tool."},
            {"toolUse": {"toolUseId": "tooluse_a2XtsIwsRse-gKI8YkFyfQ", "name": "download", "input": {"url": "www.example.com/image.png"}}}
        ]
    },
    {
        "role": "user",
        "content": [{
            "toolResult": {
                "toolUseId": "tooluse_a2XtsIwsRse-gKI8YkFyfQ",
                "content": [{"image": {"format": "png", "source": {"bytes": image_bytes}}}]
            }
        }]
    }
]

toolConfig = {
    "tools": [{
        "toolSpec": {
            "name": "download",
            "description": "Download an image from a URL",
            "inputSchema": {"json": {"type": "object", "properties": {"url": {"type": "string"}}}}
        }
    }]
}

response = client.converse(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
    messages=messages,
    toolConfig=toolConfig
)

print(response)
# {
#     'ResponseMetadata': {...},
#     'output': {
#         'message': {
#             'role': 'assistant',
#             'content': [{'text': "Here's a description of the image in 5 words:\n\nRipe apple on straw background."}]
#         }
#     },
#     'stopReason': 'end_turn',
#     'usage': {'inputTokens': 951, 'outputTokens': 27, 'totalTokens': 978},
#     'metrics': {'latencyMs': 1247}
# }

Even if the tool call is simulated, this works.

Supporting this feature out of the box would mean changing our ToolCallResult dataclass to include ImageContent and I would postpone this to the time when more model providers support this use case. OpenAI, Gemini, Ollama etc. do not allow this.

Basically, Anthropic/Bedrock allow this use case because the tool message is a user message.

I opened an issue to track this idea: deepset-ai/haystack#9432.

@anakin87 anakin87 merged commit f379e45 into main May 23, 2025
10 checks passed
@anakin87 anakin87 deleted the bedrock-multimodal branch May 23, 2025 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multimodal support in another ChatGenerator
3 participants