[Bug] Incorrect usage_metadata with LiteLLM + OpenAI + tools when using StreamingMode.SSE

**Describe the bug**

I noticed a discrepancy between the token counts from `runner.run_async -> event.usage_metadata` and what was being logged in Langfuse (more tokens in Langfuse, compared to the raw app data).

After much confusion and debugging, I have concluded it is dependent on tools enabled and `StreamingMode.SSE` being set in the `RunConfig`. With `StreamingMode.None` *or* no tools passed to the agent, token counts match as expected (but this is obviously not usable, when needing SSE streaming)

This bug only happens when tools are passed to the Agent, so it seems like it is related to the tool metadata not being counted when `StreamingMode.SSE` is set.

It does not happen using `model="gemini-"`, nor `model=LiteLlm(model="gemini-")`, seems to be only when using `model=LiteLlm(model="gpt-4o" or "gpt-4o-mini")`

**To Reproduce**

Minimal reproducable example:
```python
import asyncio
import os
from pprint import pprint

from google.adk.agents import Agent
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.models.lite_llm import LiteLlm
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai import types

APP_NAME="adk-app"
USER_ID = "adk-user"
SESSION_ID = "adk-session"


def sample_tool():
    """Just a sample"""
    return None


async def main(with_sse_run_config: bool) -> None:
    agent = Agent(
        name="chat_agent",
        model=LiteLlm(
            model="gpt-4o", 
            base_url=os.environ["LITELLM_HOST_URL"],
            api_key=os.environ["LITELLM_MASTER_KEY"],
            stream_options={"include_usage": True},
        ),
        description="Just a chatter.",
        instruction="Chat to the user.",
        tools=[sample_tool]
    )

    session_service = InMemorySessionService()

    await session_service.create_session(
        app_name=APP_NAME,
        user_id=USER_ID,
        session_id=SESSION_ID
    )

    runner = Runner(
        agent=agent,
        app_name=APP_NAME,
        session_service=session_service
    )

    query = "Hi"

    print(f"\n>>> User Query ({with_sse_run_config=}): {query}")

    content = types.Content(role='user', parts=[types.Part(text=query)])

    final_response_text = "Agent did not produce a final response."

    async for event in runner.run_async(
        user_id=USER_ID, 
        session_id=SESSION_ID, 
        new_message=content,
        run_config=(
            RunConfig(streaming_mode=StreamingMode.SSE) 
            if with_sse_run_config 
            else RunConfig()
        )
    ):
        if event.usage_metadata:
            pprint(event.usage_metadata.model_dump())
        if event.is_final_response():
            if event.content and event.content.parts:
                final_response_text = event.content.parts[0].text

    print(f"<<< Agent Response ({with_sse_run_config=}): {final_response_text}")


if __name__ == "__main__":
    asyncio.run(main(with_sse_run_config=True))
    asyncio.run(main(with_sse_run_config=False))
```
Application logs: (note the difference between `*_token_count` fields, for the exact same input prompt/tools/output response)
```
>>> User Query (with_sse_run_config=True): Hi
{'cache_tokens_details': None,
 'cached_content_token_count': None,
 'candidates_token_count': 9,
 'candidates_tokens_details': None,
 'prompt_token_count': 40,
 'prompt_tokens_details': None,
 'thoughts_token_count': None,
 'tool_use_prompt_token_count': None,
 'tool_use_prompt_tokens_details': None,
 'total_token_count': 49,
 'traffic_type': None}
<<< Agent Response (with_sse_run_config=True): Hello! How can I assist you today?

>>> User Query (with_sse_run_config=False): Hi
{'cache_tokens_details': None,
 'cached_content_token_count': None,
 'candidates_token_count': 10,
 'candidates_tokens_details': None,
 'prompt_token_count': 64,
 'prompt_tokens_details': None,
 'thoughts_token_count': None,
 'tool_use_prompt_token_count': None,
 'tool_use_prompt_tokens_details': None,
 'total_token_count': 74,
 'traffic_type': None}
<<< Agent Response (with_sse_run_config=False): Hello! How can I assist you today?
```

**Expected behavior**

Token usage reported by the ADK SDK is true and accurate, regardless of SSE streaming and tools being enabled.

**Desktop (please complete the following information):**
 - OS: Windows 11, but this bug also happens running in Docker
 - Python version(python -V): 3.13
 - ADK version(pip show google-adk): 1.6.1

 **Model Information:**
GPT-4o + GPT-4o-mini through Azure, running in a LiteLLM Proxy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] Incorrect usage_metadata with LiteLLM + OpenAI + tools when using StreamingMode.SSE #2065

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] Incorrect usage_metadata with LiteLLM + OpenAI + tools when using StreamingMode.SSE #2065

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions