Skip to content

Python: Bug: Cannot stream with 3.7 sonnet #10941

Open
@philippHorn

Description

@philippHorn

Describe the bug
In order to use the latest claude model, I need to use an inference profile as a model id: https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-use.html

This works with the converse operation, but does not seem to work with https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetFoundationModel.html

So right now, if I use an inference profile as a model ID and do streaming, I get this error:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the GetFoundationModel operation: The provided model identifier is invalid.

But if I use an inference profile id, I get this:

botocore.errorfactory.ValidationException: An error occurred (ValidationException) when calling the Converse operation: Invocation of model ID anthropic.claude-3-7-sonnet-20250219-v1:0 with on-demand throughput isn't supported. Retry your request with the ID or ARN of an inference profile that contains this model.

It seems like the latest claude model only supports inference profiles.
So as a result I see no way to run the latest sonnet model with streaming, or am I missing something?

To Reproduce
I can reproduce the issue with this script.
If I swap the MODEL_ID with the INFERENCE_PROFILE_ID, the error switches between the two errors I posted above.

import asyncio

import boto3
from django.conf import settings
from semantic_kernel.connectors.ai.anthropic import (
    AnthropicChatPromptExecutionSettings,
)
from semantic_kernel.connectors.ai.bedrock import BedrockChatCompletion
from semantic_kernel.contents import (
    ChatHistory,
    ChatMessageContent,
    AuthorRole,
    TextContent,
)

AWS_AI_REGION = "us-east-1"
MODEL_ID = "anthropic.claude-3-7-sonnet-20250219-v1:0"
INFERENCE_PROFILE_ID = "us.anthropic.claude-3-7-sonnet-20250219-v1:0"

bedrock_client = boto3.client(
    "bedrock",
    region_name=AWS_AI_REGION,
    aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)
bedrock_runtime_client = boto3.client(
    "bedrock-runtime",
    region_name=settings.AWS_AI_REGION,
    aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
    aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
)


async def main() -> None:
    sk_client = BedrockChatCompletion(
        model_id=INFERENCE_PROFILE_ID,
        client=bedrock_client,
        runtime_client=bedrock_runtime_client,
    )
    llm_settings = AnthropicChatPromptExecutionSettings(
        temperature=0.2,
    )
    history = ChatHistory(
        messages=[
            ChatMessageContent(role=AuthorRole.USER, items=[TextContent(text="hi")])
        ]
    )
    async for item in sk_client.get_streaming_chat_message_contents(
        history, llm_settings
    ):
        print(item)


if __name__ == "__main__":
    asyncio.run(main())

Expected behavior
The script should stream messages

Platform

  • Language: Python
  • Source: semantic-kernel==1.24.0
  • AI model: [e.g. OpenAI:GPT-4o-mini(2024-07-18)]
  • OS: Mac

Note
My understanding of AWS is not that deep, I hope what I wrote there is correct and makes sense

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingpythonPull requests for the Python Semantic Kernel

Type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions