-
Notifications
You must be signed in to change notification settings - Fork 447
feat(llmobs): track prompt caching for openai chat completions #13755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 275 ± 2 ms. The average import time from base is: 277 ± 2 ms. The import time difference between this PR and base is: -1.95 ± 0.08 ms. Import time breakdownThe following import paths have shrunk:
|
BenchmarksBenchmark execution time: 2025-07-09 15:29:00 Comparing candidate commit 749f06a in PR branch Found 0 performance improvements and 1 performance regressions! Performance is the same for 523 metrics, 2 unstable metrics. scenario:iastaspectsospath-ospathnormcase_aspect
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done! Small nits but otherwise lgtm
…-trace-py into evan.li/openai-prompt-caching
…-trace-py into evan.li/openai-prompt-caching
Tracks number of tokens read from the prompt cache for openai chat completions
openai does prompt caching by default and returns a
cached_tokens
field inprompt_tokens_details
https://platform.openai.com/docs/api-reference/chat/create
We rely on two keys in metrics for prompt caching:
cache_read_input_tokens
cache_write_input_tokens
We have both of these fields since bedrock/anthropic return info on cache read/writes
cached_tokens maps to
cache_read_input_tokens
Checklist
Reviewer Checklist