-
Notifications
You must be signed in to change notification settings - Fork 16
feat: multimodal support in AmazonBedrockChatGenerator
#307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 15203979000Details
💛 - Coveralls |
AmazonBedrockChatGenerator
I think it'd be good to be consistent with naming and change the name of the file from |
elif msg.tool_calls: | ||
bedrock_formatted_messages.append(_format_tool_call_message(msg)) | ||
elif msg.tool_call_results: | ||
bedrock_formatted_messages.append(_format_tool_result_message(msg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think out of scope in this PR, but do you happen to know if bedrock supports images in the tool result message? Thinking again on the request of a Tool being able to return an image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly yes: https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ToolResultContentBlock.html
This field is only supported by Anthropic Claude 3 models.
This is a bit strange because Claude should not allow images in tool results according to their API. I was wrong.
Anyway, I want to see if this works with the Converse API. I'll keep you posted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a few minor comments.
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
…ystack-experimental into bedrock-multimodal
done |
Converse API - Tool result with an imageimport os
from haystack_integrations.common.amazon_bedrock.utils import get_aws_session
session = get_aws_session(
aws_access_key_id=os.getenv("AWS_ACCESS_KEY_ID"),
aws_secret_access_key=os.getenv("AWS_SECRET_ACCESS_KEY"),
aws_session_token=os.getenv("AWS_SESSION_TOKEN"),
aws_region_name="us-east-1",
aws_profile_name=os.getenv("AWS_PROFILE"),
)
client = session.client("bedrock-runtime")
image_bytes = open("test/test_files/images/apple.jpg", "rb").read()
messages = [
{
"role": "user",
"content": [{"text": "Download the image at this url and describe it in max 5 words. URL: www.example.com/image.png"}]
},
{
"role": "assistant",
"content": [
{"text": "I need to use the download tool."},
{"toolUse": {"toolUseId": "tooluse_a2XtsIwsRse-gKI8YkFyfQ", "name": "download", "input": {"url": "www.example.com/image.png"}}}
]
},
{
"role": "user",
"content": [{
"toolResult": {
"toolUseId": "tooluse_a2XtsIwsRse-gKI8YkFyfQ",
"content": [{"image": {"format": "png", "source": {"bytes": image_bytes}}}]
}
}]
}
]
toolConfig = {
"tools": [{
"toolSpec": {
"name": "download",
"description": "Download an image from a URL",
"inputSchema": {"json": {"type": "object", "properties": {"url": {"type": "string"}}}}
}
}]
}
response = client.converse(
modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
messages=messages,
toolConfig=toolConfig
)
print(response)
# {
# 'ResponseMetadata': {...},
# 'output': {
# 'message': {
# 'role': 'assistant',
# 'content': [{'text': "Here's a description of the image in 5 words:\n\nRipe apple on straw background."}]
# }
# },
# 'stopReason': 'end_turn',
# 'usage': {'inputTokens': 951, 'outputTokens': 27, 'totalTokens': 978},
# 'metrics': {'latencyMs': 1247}
# } Even if the tool call is simulated, this works. Supporting this feature out of the box would mean changing our Basically, Anthropic/Bedrock allow this use case because the tool message is a user message. I opened an issue to track this idea: deepset-ai/haystack#9432. |
Related Issues
Proposed Changes:
Add an experimental version of
AmazonBedrockChatGenerator
, which can handle user messages with text + images._format_messages
utility functionHow did you test it?
Copied the existing tests; added new unit tests (for the utility function) + an integration test
Notes for the reviewer
haystack_experimental/components/generators/chat/bedrock.py
and is about 150 lines.Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.