Fix/gemini prompt caching usage feedback #11095
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Title
Fix async callback cache hit reporting and improve token details handling
Relevant issues
Fixes #11058
Pre-Submission checklist
Please complete all items before asking a LiteLLM maintainer to review your PR
tests/litellm/
directory, Adding at least 1 test is a hard requirementmake test-unit
Type
🐛 Bug Fix
✅ Test
Changes
Async callback cache hit
cache_hit
flag is set inmodel_call_details
before invoking async success handlers (caching_handler.py
)__call__
inasync_log_event
, passing throughcache_hit
to callbacks (custom_logger.py
)Token details handling
chunk.usage
intomodel_response
when availablePromptTokensDetails
toPromptTokensDetailsWrapper
and streamline attribute setting (streaming_chunk_builder_utils.py
)extract_cached_tokens
helper and use it in_calculate_usage
for Gemini integrations, including prompt token detail propagation (vertex_and_google_ai_studio_gemini.py
)prompt_tokens_details
when converting dict responses to streaming model objects (convert_dict_to_response.py
)Usage
model with explicitprompt_tokens_details
andcompletion_tokens_details
fields (types/utils.py
)Testing
tests/litellm/llms/vertex_ai/gemini/gemini_token_details_test_utils.py
)cache_hit=True
on cache hits