Add opt-in token usage tracking for LLM queries by patrickvossler18 · Pull Request #15 · jjfenglab/llm-api

patrickvossler18 · 2026-02-26T21:51:03Z

Adds opt-in token usage tracking to LLMApi. When enabled, token counts are printed and logged after each query.

api = LLMApi(
    cache=cache,
    seed=42,
    model_type=model,
    error_handler=error_handler,
    logging=logger,
    track_usage=True,  # off by default, existing code unaffected
)

Example output

# Single query
Token usage: 1,234 input / 567 output

# With cached/reasoning tokens (shown automatically when present)
Token usage: 1,234 input (800 cached) / 567 output (120 reasoning)

# Cached query
Token usage: 0 input / 0 output (cached)

# Batch query
Batch token usage: 5,000 input / 2,300 output (15 queries)

doesn't change response object so backward compatible
reports token counts only for OpenAI and Versa providers (no Bedrock to start because of issues getting the token info in requests)
not reporting cost for now unless we're ok using fixed pricing info
cache hits report zero tokens with a (cached) indicator
output goes to both print() and the logger
cached input tokens, reasoning tokens shown when the provider reports them

Introduce track_usage parameter on LLMApi that prints and logs token counts (input/output) for each query. Uses a LangChain callback handler to capture usage metadata, supports verbosity-aware output (detailed breakdowns at high verbosity), and reports zero tokens for cache hits. Key implementation details: - UsageCallbackHandler captures usage from on_llm_end callback - Shared parse_usage_metadata() used by both single and batch paths - Comma-formatted output, consistent format across single/batch calls - Thread-safety documented (not safe for concurrent get_output calls) Also fixes pre-existing TypeError in batch test mocks where MagicMock responses lacked explicit usage_metadata=None.

venkatesh-sivaraman

Overall looks good, I added a couple minor suggestions.

venkatesh-sivaraman · 2026-02-27T19:08:24Z

lab_llm/usage_callback_handler.py

+        if cached:
+            result["cached_tokens"] = cached
+        if reasoning:
+            result["reasoning_tokens"] = reasoning


Should total_tokens include the count of reasoning_tokens?

venkatesh-sivaraman · 2026-02-27T19:09:22Z

lab_llm/llm_api.py

+            msg += f" {suffix}"
+        return msg
+
+    def _report_usage(self, usage: dict, cached: bool = False):


It would be great if there was also a way to print/return the total usage after a bunch of LLM calls, rather than just at the time of each call.

venkatesh-sivaraman reviewed Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opt-in token usage tracking for LLM queries#15

Add opt-in token usage tracking for LLM queries#15
patrickvossler18 wants to merge 1 commit intomainfrom
feature/token-usage-tracking

patrickvossler18 commented Feb 26, 2026

Uh oh!

venkatesh-sivaraman left a comment

Uh oh!

venkatesh-sivaraman Feb 27, 2026

Uh oh!

venkatesh-sivaraman Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

patrickvossler18 commented Feb 26, 2026

Uh oh!

venkatesh-sivaraman left a comment

Choose a reason for hiding this comment

Uh oh!

venkatesh-sivaraman Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

venkatesh-sivaraman Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants