Description
Please read this first
- Have you read the custom model provider docs, including the 'Common issues' section? Model provider docs
- Have you searched for related issues? Others may have faced similar issues.
Describe the question
I don't believe prompt caching through bedrock as the model provider is supported, I tested the same prompt using bedrock converse API directly which returned prompt cache tokens used as well as using the agents sdk
but the agents sdk never returned >0 cached tokens when called again right after.
Debug information
- Agents SDK version: (e.g.
v0.0.3
) - Python version (e.g. Python 3.10)
Repro steps
Ideally provide a minimal python script that can be run to reproduce the issue.
agent = Agent(
name="big prompt agent",
instructions= "some prompt that needs prompt caching > token requirement",
model=LitellmModel(
model=f"bedrock/{BedrockModelIdentifier.CLAUDE35_HAIKU}",
),
)
result = Runner.run_sync(agent, prompt)
Method 1: Get total usage from context wrapper
total_usage = result.context_wrapper.usage
print("First request usage:")
print(
total_usage.input_tokens_details,
total_usage.output_tokens_details,
total_usage.input_tokens,
total_usage.output_tokens,
)
result2 = Runner.run_sync(agent, prompt)
print("\nSecond request usage (should show cached tokens):")
total_usage2 = result2.context_wrapper.usage
print(
total_usage2.input_tokens_details,
total_usage2.output_tokens_details,
total_usage2.input_tokens,
total_usage2.output_tokens,
)
Expected behavior
A clear and concise description of what you expected to happen.
total_usage2 input_tokens_details to return cached_tokens >0