Skip to content

Add opt-in token usage tracking for LLM queries#15

Open
patrickvossler18 wants to merge 1 commit intomainfrom
feature/token-usage-tracking
Open

Add opt-in token usage tracking for LLM queries#15
patrickvossler18 wants to merge 1 commit intomainfrom
feature/token-usage-tracking

Conversation

@patrickvossler18
Copy link
Collaborator

Adds opt-in token usage tracking to LLMApi. When enabled, token counts are printed and logged after each query.

api = LLMApi(
    cache=cache,
    seed=42,
    model_type=model,
    error_handler=error_handler,
    logging=logger,
    track_usage=True,  # off by default, existing code unaffected
)

Example output

# Single query
Token usage: 1,234 input / 567 output

# With cached/reasoning tokens (shown automatically when present)
Token usage: 1,234 input (800 cached) / 567 output (120 reasoning)

# Cached query
Token usage: 0 input / 0 output (cached)

# Batch query
Batch token usage: 5,000 input / 2,300 output (15 queries)
  • doesn't change response object so backward compatible
  • reports token counts only for OpenAI and Versa providers (no Bedrock to start because of issues getting the token info in requests)
  • not reporting cost for now unless we're ok using fixed pricing info
  • cache hits report zero tokens with a (cached) indicator
  • output goes to both print() and the logger
  • cached input tokens, reasoning tokens shown when the provider reports them

Introduce track_usage parameter on LLMApi that prints and logs token
counts (input/output) for each query. Uses a LangChain callback handler
to capture usage metadata, supports verbosity-aware output (detailed
breakdowns at high verbosity), and reports zero tokens for cache hits.

Key implementation details:
- UsageCallbackHandler captures usage from on_llm_end callback
- Shared parse_usage_metadata() used by both single and batch paths
- Comma-formatted output, consistent format across single/batch calls
- Thread-safety documented (not safe for concurrent get_output calls)

Also fixes pre-existing TypeError in batch test mocks where MagicMock
responses lacked explicit usage_metadata=None.
Copy link
Collaborator

@venkatesh-sivaraman venkatesh-sivaraman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, I added a couple minor suggestions.

if cached:
result["cached_tokens"] = cached
if reasoning:
result["reasoning_tokens"] = reasoning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should total_tokens include the count of reasoning_tokens?

msg += f" {suffix}"
return msg

def _report_usage(self, usage: dict, cached: bool = False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if there was also a way to print/return the total usage after a bunch of LLM calls, rather than just at the time of each call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants