LCORE-411: add token usage metrics#489
Conversation
WalkthroughPropagates provider_id through query handling, extends retrieve_response signature, and adds per-turn token metrics updates using a new utility. Streaming handler now updates token metrics on turn completion. Tests mock the new metrics calls and add a unit test for the utility. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant QueryHandler as query_endpoint_handler
participant Retriever as retrieve_response
participant LLM as LLM Service
participant Metrics as Metrics Utils
Client->>QueryHandler: POST /query (model hint)
QueryHandler->>QueryHandler: select_model_and_provider_id()
QueryHandler->>Retriever: retrieve_response(model_id, provider_id, request, token, ...)
Retriever->>LLM: Invoke model
LLM-->>Retriever: LLM response
Retriever->>Metrics: update_llm_token_count_from_turn(turn, model_id, provider_id, system_prompt)
Retriever-->>QueryHandler: (summary, transcript)
QueryHandler-->>Client: 200 OK (response)
sequenceDiagram
autonumber
participant Client
participant StreamHandler as streaming_query_endpoint_handler
participant Stream as LLM Stream
participant Metrics as Metrics Utils
Client->>StreamHandler: POST /streaming_query
StreamHandler->>Stream: start streaming
loop token/turns
Stream-->>StreamHandler: partials
alt turn_complete
StreamHandler->>Metrics: update_llm_token_count_from_turn(turn, model_id, provider_id, system_prompt)
note right of Metrics: Guarded with try/except and logging
end
end
StreamHandler-->>Client: stream closed
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/app/endpoints/query.py (2)
217-224: Pass raw model_id to retrieve_response; construct provider/model inside the callee.
Currently you pass llama_stack_model_id (provider/model) as model_id. This later becomes the model label in metrics, causing label inconsistency vs llm_calls_total (which uses raw model_id). Pass model_id instead and let retrieve_response build provider/model for agent calls.Apply:
- summary, conversation_id = await retrieve_response( - client, - llama_stack_model_id, - query_request, - token, - mcp_headers=mcp_headers, - provider_id=provider_id, - ) + summary, conversation_id = await retrieve_response( + client, + model_id, # raw model id for metrics labels + query_request, + token, + mcp_headers=mcp_headers, + provider_id=provider_id, + )Follow-up in retrieve_response below to build the provider/model identifier for get_agent.
453-461: Always send provider/model to get_agent by constructing it locally.
After passing raw model_id into retrieve_response, build llama_stack_model_id here.Apply:
- agent, conversation_id, session_id = await get_agent( - client, - model_id, + llama_stack_model_id = f"{provider_id}/{model_id}" if provider_id else model_id + agent, conversation_id, session_id = await get_agent( + client, + llama_stack_model_id, system_prompt, available_input_shields, available_output_shields, query_request.conversation_id, query_request.no_tools or False, )
🧹 Nitpick comments (9)
src/metrics/utils.py (1)
62-64: Minor: avoid repeatedly constructing ChatFormat.Cache a single formatter instance.
+from functools import lru_cache + +@lru_cache(maxsize=1) +def _get_formatter() -> ChatFormat: + return ChatFormat(Tokenizer.get_instance()) @@ - tokenizer = Tokenizer.get_instance() - formatter = ChatFormat(tokenizer) + formatter = _get_formatter()src/metrics/__init__.py (1)
45-55: Clean up stale TODOs now that token metrics exist.Remove or update the LCORE-411 TODO comments to prevent confusion.
tests/unit/app/endpoints/test_query.py (1)
454-455: Deduplicate metrics patching with an autouse fixture.Replace repeated mock_metrics(...) calls with one autouse fixture to reduce noise.
Example (place at top of this file or in conftest.py):
import pytest @pytest.fixture(autouse=True) def _mock_llm_token_metrics(mocker): mocker.patch( "app.endpoints.query.metrics.update_llm_token_count_from_turn", return_value=None, )Also applies to: 486-487, 519-520, 559-559, 609-609, 661-661, 773-773, 829-829, 884-884, 955-955, 1017-1017, 1113-1113, 1351-1351, 1402-1402
tests/unit/metrics/test_utis.py (4)
1-1: Typo in filename: rename to test_utils.py for consistency and discoverability.
Tests targeting metrics/utils should live under a correctly spelled filename to avoid future confusion.Apply this git mv:
- tests/unit/metrics/test_utis.py + tests/unit/metrics/test_utils.py
6-17: Simplify mocking of Async client construction.
You patch get_client twice and rebind mock_client. Create a single AsyncMock and patch once to reduce brittleness.Apply:
- mock_client = mocker.patch( - "client.AsyncLlamaStackClientHolder.get_client" - ).return_value - # Make sure the client is an AsyncMock for async methods - mock_client = mocker.AsyncMock() - mocker.patch( - "client.AsyncLlamaStackClientHolder.get_client", return_value=mock_client - ) + mock_client = mocker.AsyncMock() + mocker.patch("client.AsyncLlamaStackClientHolder.get_client", return_value=mock_client)
65-76: Make call assertions resilient by checking chained calls on child mocks explicitly.
While assert_has_calls on the parent may pass via mock_calls, asserting on children is clearer and avoids false positives if internal chaining changes.Example:
- mock_metric.assert_has_calls( - [ - mocker.call.labels("test_provider-0", "test_model-0"), - mocker.call.labels().set(0), - mocker.call.labels("default_provider", "default_model"), - mocker.call.labels().set(1), - mocker.call.labels("test_provider-1", "test_model-1"), - mocker.call.labels().set(0), - ], - any_order=False, # Order matters here - ) + mock_metric.labels.assert_has_calls( + [ + mocker.call("test_provider-0", "test_model-0"), + mocker.call("default_provider", "default_model"), + mocker.call("test_provider-1", "test_model-1"), + ] + ) + mock_metric.labels.return_value.set.assert_has_calls( + [mocker.call(0), mocker.call(1), mocker.call(0)] + )
79-124: Great targeted test for token metrics; add minimal assertions on formatter usage.
Validates both sent/received counters. Add light checks to ensure encode_dialog_prompt is called twice with expected shapes.Optional add:
+ assert mock_formatter.encode_dialog_prompt.call_count == 2 + out_call, in_call = mock_formatter.encode_dialog_prompt.call_args_list + assert isinstance(out_call.args[0], list) and len(out_call.args[0]) == 1 + assert isinstance(in_call.args[0], list) and len(in_call.args[0]) >= 1src/app/endpoints/query.py (2)
331-349: Validation may mismatch on identifier shape; compare against both forms.
ModelListResponse.identifier might be either "model" or "provider/model" depending on upstream. Current check requires identifier == provider/model, which can falsely reject. Consider accepting either.Suggested change:
- llama_stack_model_id = f"{provider_id}/{model_id}" + llama_stack_model_id = f"{provider_id}/{model_id}" # Validate that the model_id and provider_id are in the available models logger.debug("Searching for model: %s, provider: %s", model_id, provider_id) - if not any( - m.identifier == llama_stack_model_id and m.provider_id == provider_id - for m in models - ): + if not any( + (m.identifier == llama_stack_model_id) + or (m.identifier == model_id and m.provider_id == provider_id) + for m in models + ):Also, in the “first available LLM” branch, return a consistent first element:
- model_id = model.identifier - provider_id = model.provider_id - logger.info("Selected model: %s", model) - return model_id, model_id, provider_id + model_id = model.identifier + provider_id = model.provider_id + llama_stack_model_id = f"{provider_id}/{model_id}" + logger.info("Selected model: %s", llama_stack_model_id) + return llama_stack_model_id, model_id, provider_id
225-236: Consider incrementing llm_calls_total only after a successful agent turn.
You count success before create_turn; failures will also increment failures_total, producing double counting. Increment success after a successful response.Proposed move:
- # Update metrics for the LLM call - metrics.llm_calls_total.labels(provider_id, model_id).inc() ... - response = await agent.create_turn( + response = await agent.create_turn( ... ) + # Update metrics for the successful LLM call + metrics.llm_calls_total.labels(provider_id, model_id).inc()Please confirm your intended semantics (attempt vs. success) and adjust dashboards accordingly.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/app/endpoints/query.py(4 hunks)src/app/endpoints/streaming_query.py(1 hunks)src/metrics/__init__.py(1 hunks)src/metrics/utils.py(2 hunks)tests/unit/app/endpoints/test_query.py(16 hunks)tests/unit/app/endpoints/test_streaming_query.py(2 hunks)tests/unit/metrics/test_utis.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (6)
src/app/endpoints/streaming_query.py (2)
src/utils/endpoints.py (1)
get_system_prompt(53-84)src/metrics/utils.py (1)
update_llm_token_count_from_turn(58-75)
tests/unit/app/endpoints/test_streaming_query.py (1)
tests/unit/app/endpoints/test_query.py (1)
mock_metrics(49-54)
tests/unit/app/endpoints/test_query.py (1)
tests/unit/app/endpoints/test_streaming_query.py (1)
mock_metrics(61-66)
src/metrics/__init__.py (1)
src/metrics/utils.py (1)
update_llm_token_count_from_turn(58-75)
tests/unit/metrics/test_utis.py (1)
src/metrics/utils.py (2)
setup_model_metrics(18-55)update_llm_token_count_from_turn(58-75)
src/app/endpoints/query.py (1)
src/metrics/utils.py (1)
update_llm_token_count_from_turn(58-75)
🪛 GitHub Actions: Ruff
src/metrics/__init__.py
[error] 9-9: Ruff: F401 'update_llm_token_count_from_turn' imported but unused. Remove the import or re-export the symbol.
🪛 GitHub Actions: Python linter
src/metrics/utils.py
[error] 9-9: Pylint: C0411 - wrong-import-order: third-party import llama_stack_client.types.agents.turn.Turn should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
[error] 10-10: Pylint: C0411 - wrong-import-order: third-party import llama_stack.models.llama.llama3.tokenizer.Tokenizer should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
[error] 11-11: Pylint: C0411 - wrong-import-order: third-party import llama_stack.models.llama.llama3.chat_format.ChatFormat should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
[error] 12-12: Pylint: C0411 - wrong-import-order: third-party import llama_stack.models.llama.datatypes.RawMessage should be placed before first-party imports (configuration.configuration, client.AsyncLlamaStackClientHolder, log.get_logger, metrics, utils.common.run_once_async). [Command: uv run pylint src tests]
src/app/endpoints/query.py
[error] 391-391: Pylint: R0913: Too many arguments (6/5) (too-many-arguments). [Command: uv run pylint src tests]
[error] 391-391: Pylint: R0917: Too many positional arguments (6/5) (too-many-positional-arguments). [Command: uv run pylint src tests]
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (3)
src/metrics/utils.py (1)
62-64: Plumb model-specific tokenizers or defer to provider/Llama Stack token-count API
Current use ofTokenizer.get_instance()likely defaults to a Llama3 tokenizer and may miscount tokens for other providers. Plumb a tokenizer per (provider, model) or defer to the provider’s token-count API to ensure accurate billing metrics.tests/unit/app/endpoints/test_streaming_query.py (1)
61-67: LGTM: metrics side effects are properly patched for streaming tests.Keeps tests focused on endpoint behavior.
tests/unit/app/endpoints/test_query.py (1)
49-55: LGTM: central helper to patch metrics for query tests.Consistent with streaming tests.
src/metrics/__init__.py
Outdated
| Histogram, | ||
| ) | ||
|
|
||
| from .utils import update_llm_token_count_from_turn |
There was a problem hiding this comment.
Fix Ruff F401: mark re-export via all.
Declare all so the import is considered an intentional re-export.
-from .utils import update_llm_token_count_from_turn
+from .utils import update_llm_token_count_from_turn
+
+# Explicit re-exports
+__all__ = (
+ "update_llm_token_count_from_turn",
+ "rest_api_calls_total",
+ "response_duration_seconds",
+ "provider_model_configuration",
+ "llm_calls_total",
+ "llm_calls_failures_total",
+ "llm_calls_validation_errors_total",
+ "llm_token_sent_total",
+ "llm_token_received_total",
+)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from .utils import update_llm_token_count_from_turn | |
| from .utils import update_llm_token_count_from_turn | |
| # Explicit re-exports | |
| __all__ = ( | |
| "update_llm_token_count_from_turn", | |
| "rest_api_calls_total", | |
| "response_duration_seconds", | |
| "provider_model_configuration", | |
| "llm_calls_total", | |
| "llm_calls_failures_total", | |
| "llm_calls_validation_errors_total", | |
| "llm_token_sent_total", | |
| "llm_token_received_total", | |
| ) |
🧰 Tools
🪛 GitHub Actions: Ruff
[error] 9-9: Ruff: F401 'update_llm_token_count_from_turn' imported but unused. Remove the import or re-export the symbol.
🤖 Prompt for AI Agents
In src/metrics/__init__.py around line 9, the module imports
update_llm_token_count_from_turn but Ruff flags F401 because it looks like an
unused import; declare an explicit __all__ list that includes
"update_llm_token_count_from_turn" so the import is considered an intentional
public re-export. Add or update the __all__ variable to contain that symbol (and
other public names if needed) so linters recognize the re-export.
| def update_llm_token_count_from_turn( | ||
| turn: Turn, model: str, provider: str, system_prompt: str = "" | ||
| ) -> None: | ||
| """Update the LLM calls metrics from a turn.""" | ||
| tokenizer = Tokenizer.get_instance() | ||
| formatter = ChatFormat(tokenizer) | ||
|
|
||
| raw_message = cast(RawMessage, turn.output_message) | ||
| encoded_output = formatter.encode_dialog_prompt([raw_message]) | ||
| token_count = len(encoded_output.tokens) if encoded_output.tokens else 0 | ||
| metrics.llm_token_received_total.labels(provider, model).inc(token_count) | ||
|
|
||
| input_messages = [RawMessage(role="user", content=system_prompt)] + cast( | ||
| list[RawMessage], turn.input_messages | ||
| ) | ||
| encoded_input = formatter.encode_dialog_prompt(input_messages) | ||
| token_count = len(encoded_input.tokens) if encoded_input.tokens else 0 | ||
| metrics.llm_token_sent_total.labels(provider, model).inc(token_count) |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Runtime type mismatch and incorrect system-prompt role/content in token counting.
- Casting client messages to RawMessage does not convert shapes; ChatFormat.encode_dialog_prompt will likely raise at runtime.
- The system prompt should use role="system" and content must be content items, not a bare string.
Apply this refactor to safely convert messages and avoid crashes.
-def update_llm_token_count_from_turn(
- turn: Turn, model: str, provider: str, system_prompt: str = ""
-) -> None:
- """Update the LLM calls metrics from a turn."""
- tokenizer = Tokenizer.get_instance()
- formatter = ChatFormat(tokenizer)
-
- raw_message = cast(RawMessage, turn.output_message)
- encoded_output = formatter.encode_dialog_prompt([raw_message])
- token_count = len(encoded_output.tokens) if encoded_output.tokens else 0
- metrics.llm_token_received_total.labels(provider, model).inc(token_count)
-
- input_messages = [RawMessage(role="user", content=system_prompt)] + cast(
- list[RawMessage], turn.input_messages
- )
- encoded_input = formatter.encode_dialog_prompt(input_messages)
- token_count = len(encoded_input.tokens) if encoded_input.tokens else 0
- metrics.llm_token_sent_total.labels(provider, model).inc(token_count)
+def update_llm_token_count_from_turn(
+ turn: Turn, model: str, provider: str, system_prompt: str = ""
+) -> None:
+ """Update token metrics for a completed turn."""
+ tokenizer = Tokenizer.get_instance()
+ formatter = ChatFormat(tokenizer)
+
+ # ----- Output tokens (assistant) -----
+ output_tokens = 0
+ if getattr(turn, "output_message", None) is not None:
+ om = turn.output_message # type: ignore[attr-defined]
+ if isinstance(om.content, str):
+ om_content = [TextContentItem(text=om.content, type="text")]
+ else:
+ om_content = om.content
+ encoded_output = formatter.encode_dialog_prompt(
+ [RawMessage(role=getattr(om, "role", "assistant"), content=om_content)]
+ )
+ output_tokens = len(encoded_output.tokens) if encoded_output.tokens else 0
+ metrics.llm_token_received_total.labels(provider, model).inc(output_tokens)
+
+ # ----- Input tokens (system + user/tool messages) -----
+ input_raw: list[RawMessage] = []
+ if system_prompt:
+ input_raw.append(
+ RawMessage(
+ role="system",
+ content=[TextContentItem(text=system_prompt, type="text")],
+ )
+ )
+ for m in cast(list, getattr(turn, "input_messages", [])):
+ msg_content = m.content
+ if isinstance(msg_content, str):
+ content_items = [TextContentItem(text=msg_content, type="text")]
+ else:
+ content_items = msg_content
+ input_raw.append(RawMessage(role=m.role, content=content_items))
+
+ input_tokens = 0
+ if input_raw:
+ encoded_input = formatter.encode_dialog_prompt(input_raw)
+ input_tokens = len(encoded_input.tokens) if encoded_input.tokens else 0
+ metrics.llm_token_sent_total.labels(provider, model).inc(input_tokens)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def update_llm_token_count_from_turn( | |
| turn: Turn, model: str, provider: str, system_prompt: str = "" | |
| ) -> None: | |
| """Update the LLM calls metrics from a turn.""" | |
| tokenizer = Tokenizer.get_instance() | |
| formatter = ChatFormat(tokenizer) | |
| raw_message = cast(RawMessage, turn.output_message) | |
| encoded_output = formatter.encode_dialog_prompt([raw_message]) | |
| token_count = len(encoded_output.tokens) if encoded_output.tokens else 0 | |
| metrics.llm_token_received_total.labels(provider, model).inc(token_count) | |
| input_messages = [RawMessage(role="user", content=system_prompt)] + cast( | |
| list[RawMessage], turn.input_messages | |
| ) | |
| encoded_input = formatter.encode_dialog_prompt(input_messages) | |
| token_count = len(encoded_input.tokens) if encoded_input.tokens else 0 | |
| metrics.llm_token_sent_total.labels(provider, model).inc(token_count) | |
| def update_llm_token_count_from_turn( | |
| turn: Turn, model: str, provider: str, system_prompt: str = "" | |
| ) -> None: | |
| """Update token metrics for a completed turn.""" | |
| tokenizer = Tokenizer.get_instance() | |
| formatter = ChatFormat(tokenizer) | |
| # ----- Output tokens (assistant) ----- | |
| output_tokens = 0 | |
| if getattr(turn, "output_message", None) is not None: | |
| om = turn.output_message # type: ignore[attr-defined] | |
| if isinstance(om.content, str): | |
| om_content = [TextContentItem(text=om.content, type="text")] | |
| else: | |
| om_content = om.content | |
| encoded_output = formatter.encode_dialog_prompt( | |
| [RawMessage(role=getattr(om, "role", "assistant"), content=om_content)] | |
| ) | |
| output_tokens = len(encoded_output.tokens) if encoded_output.tokens else 0 | |
| metrics.llm_token_received_total.labels(provider, model).inc(output_tokens) | |
| # ----- Input tokens (system + user/tool messages) ----- | |
| input_raw: list[RawMessage] = [] | |
| if system_prompt: | |
| input_raw.append( | |
| RawMessage( | |
| role="system", | |
| content=[TextContentItem(text=system_prompt, type="text")], | |
| ) | |
| ) | |
| for m in cast(list, getattr(turn, "input_messages", [])): | |
| msg_content = m.content | |
| if isinstance(msg_content, str): | |
| content_items = [TextContentItem(text=msg_content, type="text")] | |
| else: | |
| content_items = msg_content | |
| input_raw.append(RawMessage(role=m.role, content=content_items)) | |
| input_tokens = 0 | |
| if input_raw: | |
| encoded_input = formatter.encode_dialog_prompt(input_raw) | |
| input_tokens = len(encoded_input.tokens) if encoded_input.tokens else 0 | |
| metrics.llm_token_sent_total.labels(provider, model).inc(input_tokens) |
🤖 Prompt for AI Agents
In src/metrics/utils.py around lines 58-75, the current code unsafely casts
messages to RawMessage and passes a bare string as the system prompt, causing
runtime shape mismatches for ChatFormat.encode_dialog_prompt; instead,
explicitly build properly-shaped RawMessage objects for encoding: map
turn.input_messages and turn.output_message into RawMessage instances with the
correct role and a content list (do not use direct cast), and construct the
system prompt as a RawMessage with role="system" and content expressed as the
required content-item list (wrap the string into the proper content-item
structure expected by ChatFormat) before calling formatter.encode_dialog_prompt
so token counting won’t raise at runtime.
Signed-off-by: Haoyu Sun <hasun@redhat.com>
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
src/app/endpoints/query.py (1)
392-400: Resolved: trailing params now keyword-only.
This addresses the earlier R0913/R0917 concern.
🧹 Nitpick comments (2)
src/app/endpoints/query.py (2)
416-421: Docstring clarity: specify model_id format.
Model here is the combined provider/model used by Llama Stack; make that explicit to avoid confusion with the bare model label used in metrics.@@ - Parameters: - model_id (str): The identifier of the LLM model to use. - provider_id (str): The identifier of the LLM provider to use. + Parameters: + model_id (str): Llama Stack model identifier in "provider/model" form (e.g., "openai/gpt-4.1"). + provider_id (str): Provider identifier (e.g., "openai"). Keyword-only.
518-521: Keep label normalization; guard metrics to be non-fatal.
Nice job normalizing the model label. Wrap the metrics call so tokenizer/formatter issues never fail the endpoint.- # Update token count metrics for the LLM call - model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id - update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) + # Update token count metrics for the LLM call (best-effort) + try: + model_label = model_id.split("/", 1)[1] if "/" in model_id else model_id + update_llm_token_count_from_turn(response, model_label, provider_id, system_prompt) + except Exception as e: # noqa: BLE001 + logger.warning("Token metrics update failed: %s", e, exc_info=True)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (6)
src/app/endpoints/query.py(5 hunks)src/app/endpoints/streaming_query.py(2 hunks)src/metrics/utils.py(2 hunks)tests/unit/app/endpoints/test_query.py(16 hunks)tests/unit/app/endpoints/test_streaming_query.py(2 hunks)tests/unit/metrics/test_utis.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- tests/unit/app/endpoints/test_streaming_query.py
- tests/unit/metrics/test_utis.py
- src/app/endpoints/streaming_query.py
- tests/unit/app/endpoints/test_query.py
- src/metrics/utils.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/app/endpoints/query.py (2)
src/metrics/utils.py (1)
update_llm_token_count_from_turn(60-77)src/models/requests.py (1)
QueryRequest(72-222)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: build-pr
- GitHub Check: e2e_tests
🔇 Additional comments (2)
src/app/endpoints/query.py (2)
27-27: LGTM: metrics utility import is correct.
Direct import keeps call sites explicit and avoids circular deps.
216-223: Good: provider_id plumbed and passed keyword-only.
Call site matches the updated retrieve_response signature.
|
E2E test failure is due to timeout(10s) on querying |
|
E2E test passed again. |
tisnik
left a comment
There was a problem hiding this comment.
LGTM on my side, thank you
Description
record metrics of tokens sent to LLM provider and those received from LLM provider.
Type of change
Related Tickets & Documents
Checklist before requesting a review
Testing
call the query or streaming_query endpoint and check the metrics
llm_token_received_totalandllm_token_sent_totalSummary by CodeRabbit
Chores
Tests
Note