Skip to content

enable prompt caching#36

Open
tsuz wants to merge 4 commits into
mainfrom
feat/cache-control
Open

enable prompt caching#36
tsuz wants to merge 4 commits into
mainfrom
feat/cache-control

Conversation

@tsuz
Copy link
Copy Markdown
Owner

@tsuz tsuz commented May 25, 2026

Fixes #29

@tsuz tsuz changed the title enable cache control enable prompt caching May 25, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Enables Anthropic prompt caching by introducing a PROMPT_CACHING flag and attaching a cache_control breakpoint to the Claude system prompt, with accompanying documentation and tests. The Java Think Consumer also updates cost calculation to account for cached-token billing reported by Anthropic.

Changes:

  • Add PROMPT_CACHING env/config flag and emit system as content blocks with optional cache_control: { type: "ephemeral" }.
  • Update Java Claude response parsing/cost calculation to include cache read/write token usage.
  • Add/update SDK runners and documentation to expose the prompt caching capability.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
think/think-consumer/src/test/java/io/flightdeck/think/service/ClaudeApiServiceTest.java Adds unit tests validating cache_control inclusion/omission in system blocks.
think/think-consumer/src/main/java/io/flightdeck/think/service/ClaudeApiService.java Adds buildSystemBlocks, uses it in requests, and includes cached-token billing in cost calculation/logging.
think/think-consumer/src/main/java/io/flightdeck/think/config/AppConfig.java Introduces PROMPT_CACHING configuration flag.
sdk/typescript/src/think-consumer-runner.ts Adds TS runner support for prompt caching (but needs cached-token cost handling).
sdk/python/flightdeck_sdk/think_consumer_runner.py Adds Python runner support for prompt caching (but needs cached-token cost handling).
README.md Documents PROMPT_CACHING env var behavior and constraints.
memoir/update-memoir-consumer/src/main/java/io/flightdeck/memoir/service/ClaudeMemoirService.java Adds prompt-caching-aware system request construction.
memoir/update-memoir-consumer/src/main/java/io/flightdeck/memoir/config/AppConfig.java Introduces PROMPT_CACHING configuration flag for memoir consumer.
architecture/models.md Adds architecture/schema documentation including the new env var.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +34 to +40
// Token pricing from environment variables (per-token, not per-million)
const INPUT_TOKEN_PRICE = process.env.INPUT_TOKEN_PRICE
? parseFloat(process.env.INPUT_TOKEN_PRICE)
: null;
const OUTPUT_TOKEN_PRICE = process.env.OUTPUT_TOKEN_PRICE
? parseFloat(process.env.OUTPUT_TOKEN_PRICE)
: null;
Comment on lines +428 to +435
const usage = (response.usage as Record<string, number>) || {};
const inputTokens = usage.input_tokens || 0;
const outputTokens = usage.output_tokens || 0;
const cost =
INPUT_TOKEN_PRICE != null && OUTPUT_TOKEN_PRICE != null
? (inputTokens / 1_000_000) * INPUT_TOKEN_PRICE + (outputTokens / 1_000_000) * OUTPUT_TOKEN_PRICE
: null;

Comment on lines 334 to 351
def _call_claude(self, system_prompt: str, messages: list[dict], *, include_tools: bool = True) -> dict:
system: Any = system_prompt
if self._config.prompt_caching:
# Add a cache_control breakpoint so the static prefix can be cached.
system = [
{
"type": "text",
"text": system_prompt,
"cache_control": {"type": "ephemeral"},
}
]

body: dict[str, Any] = {
"model": self._config.claude_model,
"max_tokens": self._config.claude_max_tokens,
"system": system_prompt,
"system": system,
"messages": messages,
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable prompt read caching

2 participants