Skip to content

Add prompt caching for AmazonBedrockGenerator and AmazonBedrockChatGenerator #2776

@deep-rloebbert

Description

@deep-rloebbert

Is your feature request related to a problem? Please describe.
Yes. When using AmazonBedrockGenerator or AmazonBedrockChatGenerator with large, static contexts (such as long system prompts or few-shot examples), the model re-processes the input tokens for every request. This results in unnecessarily high costs and increased latency.

Amazon Bedrock supports Prompt Caching to reuse processed tokens, but currently, there is no mechanism in the Haystack components to pass the required caching parameters to the underlying Bedrock API.

Describe the solution you'd like
I would like the ability to specify prompt caching parameters within the Haystack Bedrock components. Best case a simple bool toggle.

Ideally, this would be handled via metadata in ChatMessage (for Chat Generators) or call parameters. When the component constructs the API payload, it should detect these flags and inject the specific cache control structure required by Bedrock into the request body (e.g., adding cache_control or cachePoint fields to the message blocks).

Describe alternatives you've considered
Spending too much money.

Additional context

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions