Skip to content

Fix/gemini prompt caching usage feedback #11095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

daarko10
Copy link
Contributor

@daarko10 daarko10 commented May 23, 2025

Title

Fix async callback cache hit reporting and improve token details handling

Relevant issues

Fixes #11058

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible

Type

🐛 Bug Fix
✅ Test

Changes

  • Async callback cache hit

    • Ensure cache_hit flag is set in model_call_details before invoking async success handlers (caching_handler.py)
    • Detect and await callable classes with async __call__ in async_log_event, passing through cache_hit to callbacks (custom_logger.py)
  • Token details handling

    • Simplify token-count logic
    • Copy chunk.usage into model_response when available
    • Switch prompt token detail types from PromptTokensDetails to PromptTokensDetailsWrapper and streamline attribute setting (streaming_chunk_builder_utils.py)
    • Introduce extract_cached_tokens helper and use it in _calculate_usage for Gemini integrations, including prompt token detail propagation (vertex_and_google_ai_studio_gemini.py)
    • Preserve prompt_tokens_details when converting dict responses to streaming model objects (convert_dict_to_response.py)
    • Extend Usage model with explicit prompt_tokens_details and completion_tokens_details fields (types/utils.py)
  • Testing

    • New test utility file for Gemini token detail scenarios (tests/litellm/llms/vertex_ai/gemini/gemini_token_details_test_utils.py)
    • Updated existing tests to use helpers
    • Verified that:
      1. No more “invalid imports” warnings
      2. Async callbacks receive cache_hit=True on cache hits
      3. Token detail wrappers are correctly populated in all response paths
image

Copy link

vercel bot commented May 23, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback May 23, 2025 3:17pm

@daarko10
Copy link
Contributor Author

@krrishdholakia can you have a look at why the test fail? I didn't touch anything revolving that and when I run it locally it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Gemini doesn't report back it's cached tokens hits correctly on acompletion
1 participant