-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Optimize for OpenAI Prompt Caching: Restructure entity extraction pro… #2426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Optimize for OpenAI Prompt Caching: Restructure entity extraction pro… #2426
Conversation
…mpts - Remove input_text from entity_extraction_system_prompt to enable caching - Move input_text to entity_extraction_user_prompt for per-chunk variability - Update operate.py to format system prompt once without input_text - Format user prompts with input_text for each chunk This enables OpenAI's automatic prompt caching (50% discount on cached tokens): - ~1300 token system message cached and reused for ALL chunks - Only ~150 token user message varies per chunk - Expected 45% cost reduction on prompt tokens during indexing - 2-3x faster response times from cached prompts Fixes HKUDS#2355
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR restructures entity extraction prompts to leverage OpenAI's automatic prompt caching by separating static instructions from variable content. The system prompt (containing role, instructions, examples, and entity types) is now formatted once per indexing run and cached across all chunks, while the user prompt contains the variable input_text for each chunk. This enables 45-50% cost reduction and 2-3x faster indexing for supported GPT models.
- Moved
{input_text}from system prompt to user prompt template - Updated prompt formatting logic to format system prompt once without input_text
- Added clarifying comments explaining caching behavior
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| lightrag/prompt.py | Restructured prompts by removing {input_text} from entity_extraction_system_prompt and moving it to entity_extraction_user_prompt to enable prompt caching |
| lightrag/operate.py | Updated prompt formatting to format system prompt once without input_text and format user prompts with input_text per chunk, with added comments explaining OpenAI caching behavior |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@codex review |
|
Codex Review: Didn't find any major issues. More of your lovely PRs please. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
PR Title: Optimize for OpenAI Prompt Caching: Restructure entity extraction prompts for 50% cost reduction
Branch: fix/openai-prompt-caching-optimization-2355
Issue: #2355
===== DESCRIPTION =====
Summary
Restructures entity extraction prompts to leverage OpenAI's automatic prompt caching, enabling 45-50% cost reduction and 2-3x faster indexing for GPT-4o, GPT-4o-mini, o1-preview, and o1-mini models.
Problem
LightRAG's current prompt structure prevents effective OpenAI prompt caching during indexing:
OpenAI's prompt caching works by caching the longest shared prefix of prompts (>1024 tokens). When input_text appeared at the end of the system prompt, every chunk had a completely different system prompt, preventing any caching.
Solution
Separated static instructions from variable content:
System Prompt (static, ~1,300 tokens - cacheable):
User Prompt (variable, ~150 tokens per chunk):
Changes Made
lightrag/prompt.py:
lightrag/operate.py (line ~2835):
Impact & Benefits
Cost Savings (for 8,000 chunk indexing run):
Performance Improvements:
Automatic Activation:
Technical Details
Testing
Pull Request Readiness Checklist
References