-
Notifications
You must be signed in to change notification settings - Fork 205
Description
Is your feature request related to a problem? Please describe.
Yes. When using AmazonBedrockGenerator or AmazonBedrockChatGenerator with large, static contexts (such as long system prompts or few-shot examples), the model re-processes the input tokens for every request. This results in unnecessarily high costs and increased latency.
Amazon Bedrock supports Prompt Caching to reuse processed tokens, but currently, there is no mechanism in the Haystack components to pass the required caching parameters to the underlying Bedrock API.
Describe the solution you'd like
I would like the ability to specify prompt caching parameters within the Haystack Bedrock components. Best case a simple bool toggle.
Ideally, this would be handled via metadata in ChatMessage (for Chat Generators) or call parameters. When the component constructs the API payload, it should detect these flags and inject the specific cache control structure required by Bedrock into the request body (e.g., adding cache_control or cachePoint fields to the message blocks).
Describe alternatives you've considered
Spending too much money.
Additional context
- AWS Documentation: [Amazon Bedrock Prompt Caching](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-simplified)
- Benefit: caching can significantly reduce inference costs (up to 90% for cached tokens) and latency for supported models.