Description
Describe the bug
In order to use the latest claude model, I need to use an inference profile as a model id: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html
This works with the converse operation, but does not seem to work with https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html
So right now, if I use an inference profile as a model ID and do streaming, I get this error:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the GetFoundationModel operation: The provided model identifier is invalid.
But if I use an inference profile id, I get this:
botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model.
It seems like the latest claude model only supports inference profiles.
So as a result I see no way to run the latest sonnet model with streaming, or am I missing something?
To Reproduce
I can reproduce the issue with this script.
If I swap the MODEL_ID with the INFERENCE_PROFILE_ID, the error switches between the two errors I posted above.
import asyncio
import boto3
from django.conf import settings
from semantic_kernel.connectors.ai.anthropic import (
AnthropicChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.bedrock import BedrockChatCompletion
from semantic_kernel.contents import (
ChatHistory,
ChatMessageContent,
AuthorRole,
TextContent,
)
AWS_AI_REGION = "us-east-1"
MODEL_ID = "anthropic.claude-3-7-sonnet-20250219-v1:0"
INFERENCE_PROFILE_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"
bedrock_client = boto3.client(
"bedrock",
region_name=AWS_AI_REGION,
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
bedrock_runtime_client = boto3.client(
"bedrock-runtime",
region_name=settings.AWS_AI_REGION,
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
async def main() -> None:
sk_client = BedrockChatCompletion(
model_id=INFERENCE_PROFILE_ID,
client=bedrock_client,
runtime_client=bedrock_runtime_client,
)
llm_settings = AnthropicChatPromptExecutionSettings(
temperature=0.2,
)
history = ChatHistory(
messages=[
ChatMessageContent(role=AuthorRole.USER, items=[TextContent(text="hi")])
]
)
async for item in sk_client.get_streaming_chat_message_contents(
history, llm_settings
):
print(item)
if __name__ == "__main__":
asyncio.run(main())
Expected behavior
The script should stream messages
Platform
- Language: Python
- Source: semantic-kernel==1.24.0
- AI model: [e.g. OpenAI:GPT-4o-mini(2024-07-18)]
- OS: Mac
Note
My understanding of AWS is not that deep, I hope what I wrote there is correct and makes sense
Metadata
Metadata
Assignees
Type
Projects
Status