-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Describe the bug
I noticed a discrepancy between the token counts from runner.run_async -> event.usage_metadata and what was being logged in Langfuse (more tokens in Langfuse, compared to the raw app data).
After much confusion and debugging, I have concluded it is dependent on tools enabled and StreamingMode.SSE being set in the RunConfig. With StreamingMode.None or no tools passed to the agent, token counts match as expected (but this is obviously not usable, when needing SSE streaming)
This bug only happens when tools are passed to the Agent, so it seems like it is related to the tool metadata not being counted when StreamingMode.SSE is set.
It does not happen using model="gemini-", nor model=LiteLlm(model="gemini-"), seems to be only when using model=LiteLlm(model="gpt-4o" or "gpt-4o-mini")
To Reproduce
Minimal reproducable example:
import asyncio
import os
from pprint import pprint
from google.adk.agents import Agent
from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.models.lite_llm import LiteLlm
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai import types
APP_NAME="adk-app"
USER_ID = "adk-user"
SESSION_ID = "adk-session"
def sample_tool():
"""Just a sample"""
return None
async def main(with_sse_run_config: bool) -> None:
agent = Agent(
name="chat_agent",
model=LiteLlm(
model="gpt-4o",
base_url=os.environ["LITELLM_HOST_URL"],
api_key=os.environ["LITELLM_MASTER_KEY"],
stream_options={"include_usage": True},
),
description="Just a chatter.",
instruction="Chat to the user.",
tools=[sample_tool]
)
session_service = InMemorySessionService()
await session_service.create_session(
app_name=APP_NAME,
user_id=USER_ID,
session_id=SESSION_ID
)
runner = Runner(
agent=agent,
app_name=APP_NAME,
session_service=session_service
)
query = "Hi"
print(f"\n>>> User Query ({with_sse_run_config=}): {query}")
content = types.Content(role='user', parts=[types.Part(text=query)])
final_response_text = "Agent did not produce a final response."
async for event in runner.run_async(
user_id=USER_ID,
session_id=SESSION_ID,
new_message=content,
run_config=(
RunConfig(streaming_mode=StreamingMode.SSE)
if with_sse_run_config
else RunConfig()
)
):
if event.usage_metadata:
pprint(event.usage_metadata.model_dump())
if event.is_final_response():
if event.content and event.content.parts:
final_response_text = event.content.parts[0].text
print(f"<<< Agent Response ({with_sse_run_config=}): {final_response_text}")
if __name__ == "__main__":
asyncio.run(main(with_sse_run_config=True))
asyncio.run(main(with_sse_run_config=False))Application logs: (note the difference between *_token_count fields, for the exact same input prompt/tools/output response)
>>> User Query (with_sse_run_config=True): Hi
{'cache_tokens_details': None,
'cached_content_token_count': None,
'candidates_token_count': 9,
'candidates_tokens_details': None,
'prompt_token_count': 40,
'prompt_tokens_details': None,
'thoughts_token_count': None,
'tool_use_prompt_token_count': None,
'tool_use_prompt_tokens_details': None,
'total_token_count': 49,
'traffic_type': None}
<<< Agent Response (with_sse_run_config=True): Hello! How can I assist you today?
>>> User Query (with_sse_run_config=False): Hi
{'cache_tokens_details': None,
'cached_content_token_count': None,
'candidates_token_count': 10,
'candidates_tokens_details': None,
'prompt_token_count': 64,
'prompt_tokens_details': None,
'thoughts_token_count': None,
'tool_use_prompt_token_count': None,
'tool_use_prompt_tokens_details': None,
'total_token_count': 74,
'traffic_type': None}
<<< Agent Response (with_sse_run_config=False): Hello! How can I assist you today?
Expected behavior
Token usage reported by the ADK SDK is true and accurate, regardless of SSE streaming and tools being enabled.
Desktop (please complete the following information):
- OS: Windows 11, but this bug also happens running in Docker
- Python version(python -V): 3.13
- ADK version(pip show google-adk): 1.6.1
Model Information:
GPT-4o + GPT-4o-mini through Azure, running in a LiteLLM Proxy