Feature Request: Enable Prompt Caching in PromptTemplate for ChatAnthropic #29747
DonghaeSuh
started this conversation in
Ideas
Replies: 1 comment
-
Just what I needed! Let's code it so it works in gemini too~. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Checked
Feature request
When using
PromptTemplate
to load a string-based prompt template, it must be converted into messages before being passed to aChatModel
.This feature introduces a simple prompt_caching argument that allows users to specify how
{placeholders}
should define KV caching boundaries. Duringinvoke()
, the template is automatically segmented based on the provided placeholders, ensuring that Anthropic’s caching breakpoints{"cache_control": {"type": "ephemeral"}}
are inserted at the intended positions.By applying this caching logic, we can enhance the usage of
PromptTemplate | ChatModel
chains without requiring manual prompt restructuring (splitting → converting to messages → adding cache_control arguments).This significantly improves usability in real-world applications where predefined
.txt
prompt files are dynamically loaded intoPromptTemplate.
Motivation
Currently, when using
langchain_anthropic.ChatAnthropic
in a production service, we store multiple prompts as.txt
files and dynamically load them intoPromptTemplate.
However, to leverage prompt caching (KV caching), we must manually split these whole string prompts into messages format and explicitly insert
{"cache_control": {"type": "ephemeral"}}
at the intended positions.Since the primary purpose of KV caching is to cache the static portion of a prompt, an automated way to determine the breakpoint based on
{placeholders}
would significantly reduce manual overhead.This feature would allow users to specify whether a
{placeholder}
should be included in the cached portion or not, simplifying prompt caching inChatAnthropic
without requiring additional manual prompt restructuring.Proposal (If applicable)
Implementation Details
Pass
prompt_caching
argument as an optional keyword argument (kwargs
) when invokingPromptTemplate
.When
prompt_caching
argument is provided, store the following three attributes inside the returnedStringPromptValue
:template
: The original prompt string with{placeholders}
intact.input_dict
: The dictionary of{"input variable": "value"}
passed during invocation.prompt_caching
: A dictionary mapping{placeholder}
names to"front"
or"back"
, or a booleanTrue
."front"
: Excludes the placeholder from caching."back"
: Includes the placeholder in caching.True
: Caches the entire prompt (useful for static prompts with no placeholders).prompt caching
argument is designed to support up to 4 key-value pairs, which corresponds to the maximum number of breakpoints supported by Anthropic's prompt caching. For each breakpoint, the prompt will be split into chunks and cached accordingly.Modify
to_messages()
function inStringPromptValue
to automatically split the message based on the stored attributes, ensuring that{"cache_control": {"type": "ephemeral"}}
is inserted at the appropriate breakpoints.Example Usage
In this example, the
{documents}
section and everything before it will be kv cached (if it exceeds 1024 or 2048 tokens, depending on the model type : link), while the rest will be not cached.I have also confirmed the demand for this feature in #27340, #26701
I appreciate any feedback or suggestions to refine this proposal. If there are specific requirements or best practices to consider, please let me know!
I'm prepared to submit a pull request or open an issue for this feature—guidance on the preferred contribution process would be helpful.
Beta Was this translation helpful? Give feedback.
All reactions