Skip to content

Parse new LCORE tool_call format#150

Merged
asamal4 merged 1 commit intolightspeed-core:mainfrom
bparees:tool_parser
Feb 5, 2026
Merged

Parse new LCORE tool_call format#150
asamal4 merged 1 commit intolightspeed-core:mainfrom
bparees:tool_parser

Conversation

@bparees
Copy link
Contributor

@bparees bparees commented Jan 30, 2026

Description

Adds support for the current LCORE tool_call response format, e.g:

data: {"event": "tool_call", "data": {"id": "mcp_list_cb6c7811-b2de-4ab7-9109-63a3fac1a5ee", "name": "mcp_list_tools", "args": {"server_label": "obs"}, "type": "mcp_list_tools"}}

data: {"event": "tool_result", "data": {"id": "mcp_list_cb6c7811-b2de-4ab7-9109-63a3fac1a5ee", "status": "success", "content": "{\"server_label\": \"obs\", \"tools\": [{\"name\": \"execute_range_query\", \"description\": \"Execute a PromQL range query with flexible time specification.\\n\\nFor current time data queries, use only the 'duration' parameter to specify how far back\\nto look from now (e.g., '1h' for last hour, '30m' for last 30 minutes). In that case\\nSET 'end' to 'NOW' and leave 'start' empty.\\n\\nFor historical data queries, use explicit 'start' and 'end' times.\\n\", \"input_schema\": {\"type\": \"object\", \"properties\": {\"duration\": {\"description\": \"Duration to look back from now (e.g., '1h', '30m', '1d', '2w') (optional)\", \"pattern\": \"^\\\\d+[smhdwy]$\", \"type\": \"string\"}, \"end\": {\"description\": \"End time as RFC3339 or Unix timestamp (optional). Use `NOW` for current time.\", \"type\": \"string\"}, \"query\": {\"description\": \"PromQL query string\", \"type\": \"string\"}, \"start\": {\"description\": \"Start time as RFC3339 or Unix timestamp (optional)\", \"type\": \"string\"}, \"step\": {\"description\": \"Query resolution step width (e.g., '15s', '1m', '1h')\", \"pattern\": \"^\\\\d+[smhdwy]$\", \"type\": \"string\"}}, \"required\": [\"query\", \"step\"]}}, {\"name\": \"list_metrics\", \"description\": \"List all available metrics in Prometheus\", \"input_schema\": {\"type\": \"object\", \"properties\": {}}}]}", "type": "mcp_list_tools", "round": 1}}

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Verified by running evals against a current LCORE deployment (v0.4+)

Summary by CodeRabbit

  • New Features
    • Streaming now accepts explicit tool-result events, maps tool calls by ID, and links results back to their originating call; supports both legacy and new tool-call formats so arguments and results propagate correctly.
  • Bug Fixes
    • Improved parsing robustness and error logging for mixed-format tool-call and result events.
  • Tests
    • Unit tests expanded to cover new-format tool calls, result association across multiple calls, parsing variations, and performance metric preservation.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 30, 2026

Walkthrough

The streaming parser adds support for tool_result events, stores tool calls by ID via a new tool_calls_by_id field, and normalizes both legacy (tool_name/arguments) and new (name/args) tool-call formats. Results are applied to stored calls when an ID is present.

Changes

Cohort / File(s) Summary
Streaming Parser Enhancement
src/lightspeed_evaluation/core/api/streaming_parser.py
Added tool_result to CONTENT_EVENTS. Introduced tool_calls_by_id on StreamingContext. _process_event normalizes legacy/new tool-call formats, appends parsed calls to tool_calls, stores calls by id when present, and routes tool_result events to _process_tool_result. _parse_tool_call now accepts generic event_data and returns {tool_name, arguments, result?}. Added _process_tool_result to attach results to stored tool calls with validation and logging.
Parser Tests
tests/unit/core/api/test_streaming_parser.py
Replaced Any with Mock in test signatures and imports, converted legacy embedded tool_call fixtures to inline JSON, and added/updated tests for new-format tool calls (id, name, args), tool_result handling, multiple-call aggregation, and result association to call IDs. Adjusted expectations and performance-metrics assertions.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client (SSE producer)
participant Parser as StreamingParser
participant Context as StreamingContext
participant Store as tool_calls_by_id (store)

rect rgba(200,200,255,0.5)
Client->>Parser: send `tool_call` (new or legacy)
Parser->>Parser: _process_event -> _parse_tool_call
Parser->>Context: append tool_call to `tool_calls`
alt id present
    Parser->>Store: store tool_call by id
end
end

rect rgba(200,255,200,0.5)
Client->>Parser: send `tool_result` (with id)
Parser->>Parser: _process_event -> _process_tool_result
Parser->>Store: lookup tool_call by id
alt found
    Parser->>Context: attach result to stored tool_call
else not found
    Parser->>Parser: log missing id / ignore
end
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and specifically describes the main change: adding support for parsing the new LCORE tool_call format with both tool_call and tool_result events.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bparees
Copy link
Contributor Author

bparees commented Jan 30, 2026

@asamal4 w/o this change i was not getting tool_call data in the eval results when using a current LCORE deployment

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/lightspeed_evaluation/core/api/streaming_parser.py`:
- Around line 206-219: The parser incorrectly treats an empty new-format args
dict as missing because it uses "or" when assigning tool_name and arguments;
change the logic to prefer the "name"/"args" keys when present by checking key
presence on event_data (e.g., use "if 'name' in event_data: tool_name =
event_data['name'] else: tool_name = event_data.get('tool_name')" and similarly
for "args"/"arguments") and then treat arguments as missing only if neither key
is present (i.e., arguments is None and neither "args" nor "arguments" keys
exist). Update the code paths around event_data, tool_name, and arguments in
streaming_parser.py so empty dicts for args are accepted as valid.

Copy link
Collaborator

@asamal4 asamal4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @bparees
Not sure when this got changed, probably to make it compatible with OLS..
We had this logic only for query endpoint to make it compatible with existing OLS

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/core/api/test_streaming_parser.py (1)

3-22: ⚠️ Potential issue | 🟠 Major

Use pytest-mock types instead of unittest.mock.
Line 4 imports Mock from unittest.mock, which violates the tests/**/*.py guideline. Replace with Any for type hints and remove the unittest.mock import entirely.

🔧 Suggested changes
-from typing import Generator
-from unittest.mock import Mock
+from typing import Any, Generator
...
-@pytest.fixture
-def mock_response(mocker: MockerFixture) -> Generator[Mock, None, None]:
+@pytest.fixture
+def mock_response(mocker: MockerFixture) -> Generator[Any, None, None]:

Update all test function parameters from mock_response: Mock to mock_response: Any (lines 27, 45, 61, 71, 81, 91, 107, 122, 138, 157, 319, 333, 349, 367, 381, 396, 411).

As per coding guidelines: tests/**/*.py: Use pytest mocking with mocker fixture instead of unittest.mock; Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
tests/unit/core/api/test_streaming_parser.py (2)

3-22: ⚠️ Potential issue | 🟠 Major

Remove the local mock_response fixture and drop the unittest.mock import to comply with testing guidelines.

The local fixture shadows the shared one from tests/unit/core/api/conftest.py and the unittest.mock.Mock import violates the requirement to use pytest mocking only. Update type annotations from Mock to Any and reuse the shared fixture.

Proposed changes
-from typing import Generator
-from unittest.mock import Mock
+from typing import Any

-import pytest
-from pytest_mock import MockerFixture
+import pytest

 from lightspeed_evaluation.core.api.streaming_parser import (
     parse_streaming_response,
     _parse_sse_line,
     _parse_tool_call,
     _format_tool_sequences,
 )

-
-@pytest.fixture
-def mock_response(mocker: MockerFixture) -> Generator[Mock, None, None]:
-    """Create a mock streaming response."""
-    response = mocker.Mock()
-    yield response
-

 class TestParseStreamingResponse:
-    def test_parse_complete_response(self, mock_response: Mock) -> None:
+    def test_parse_complete_response(self, mock_response: Any) -> None:

274-280: ⚠️ Potential issue | 🟠 Major

Add Any type annotation and import to the invalid-structure test.

The test passes a string to _parse_tool_call() which expects dict[str, Any]. Annotate token as Any to make the call type-safe, and add Any to the typing imports.

🛠️ Proposed fix
-from typing import Generator
+from typing import Any, Generator
-        token = "not a dict"
+        token: Any = "not a dict"
🤖 Fix all issues with AI agents
In `@tests/unit/core/api/test_streaming_parser.py`:
- Around line 45-50: The long JSON string literals in the test cases (e.g., the
lines list inside test_parse_response_with_tool_calls) are exceeding line-length
limits; update those literals by splitting them using implicit string
concatenation or building the JSON via helper variables (e.g., split the 'data:
{"event": ...}' strings across multiple adjacent quoted segments or construct
the dict then json.dumps it) to keep each source line under the linter limit;
apply the same fix to the other affected test blocks referenced (around lines
122-127, 138-167, 396-401, 411-417) so all long 'data: {...}' literals are
wrapped/split similarly.
- Around line 210-211: The test unpacks result into event and data but never
uses data, triggering Pylint W0612; change the unpack to use a throwaway name
(e.g., event, _ = result) or explicitly assert something about data if intended
to be checked, updating the line that currently reads "event, data = result" and
keeping the existing assertion "assert event == \"\"" and referencing the
variables result, event, and data in the test function.

Comment on lines 210 to 211
event, data = result
assert event == "" # Default empty string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Avoid unused data binding.

Pylint W0612 — use _ or assert on the value.

🛠️ Proposed fix
-        event, data = result
+        event, _ = result
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
event, data = result
assert event == "" # Default empty string
event, _ = result
assert event == "" # Default empty string
🧰 Tools
🪛 GitHub Actions: Python linter

[error] 210-210: W0612: Unused variable 'data' (unused-variable)

🤖 Prompt for AI Agents
In `@tests/unit/core/api/test_streaming_parser.py` around lines 210 - 211, The
test unpacks result into event and data but never uses data, triggering Pylint
W0612; change the unpack to use a throwaway name (e.g., event, _ = result) or
explicitly assert something about data if intended to be checked, updating the
line that currently reads "event, data = result" and keeping the existing
assertion "assert event == \"\"" and referencing the variables result, event,
and data in the test function.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@tests/unit/core/api/test_streaming_parser.py`:
- Around line 3-5: Remove the import "from unittest.mock import Mock" and
replace every type annotation that uses Mock in the test functions with
typing.Any (e.g., change parameters annotated as Mock to Any) so the pytest-mock
"mocker" fixture is used without referencing unittest.mock; update all affected
test function signatures (the ones noted in the review) to use Any and ensure
"from typing import Any" remains as the only import for typing.

Comment on lines 3 to +5
from typing import Any
from unittest.mock import Mock

Copy link
Contributor

@coderabbitai coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n tests/unit/core/api/test_streaming_parser.py | head -30

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 1212


🏁 Script executed:

grep -n "def test_" tests/unit/core/api/test_streaming_parser.py | head -20

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 1501


🏁 Script executed:

find tests -name "conftest.py" -exec cat -n {} \;

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 27967


🏁 Script executed:

sed -n '133,137p' tests/unit/core/api/test_streaming_parser.py
sed -n '156,160p' tests/unit/core/api/test_streaming_parser.py

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 495


Remove unittest.mock import; use pytest-mock fixture typing instead.

The test file imports Mock from unittest.mock at line 4, but tests use the mocker fixture (which creates mocks via pytest-mock). Replace all Mock type hints with Any and remove the unittest.mock import.

🔧 Example adjustment (apply to all test signatures)
-from unittest.mock import Mock
+from typing import Any
...
-    def test_parse_complete_response(self, mock_response: Mock) -> None:
+    def test_parse_complete_response(self, mock_response: Any) -> None:

As per coding guidelines: Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock.

Applies to all affected test methods: lines 19, 37, 54, 64, 74, 84, 100, 115, 133, 156.

🤖 Prompt for AI Agents
In `@tests/unit/core/api/test_streaming_parser.py` around lines 3 - 5, Remove the
import "from unittest.mock import Mock" and replace every type annotation that
uses Mock in the test functions with typing.Any (e.g., change parameters
annotated as Mock to Any) so the pytest-mock "mocker" fixture is used without
referencing unittest.mock; update all affected test function signatures (the
ones noted in the review) to use Any and ensure "from typing import Any" remains
as the only import for typing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this separately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asamal4, sounds good! Would you like me to create a follow-up issue to track this refactoring task?


🧠 Learnings used
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-12T11:01:11.398Z
Learning: Applies to tests/**/*.py : Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock

Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-12T11:01:11.398Z
Learning: Applies to tests/**/*.py : Use pytest mocking with mocker fixture instead of unittest.mock

Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-01-12T11:01:11.398Z
Learning: Applies to tests/**/*.py : Mock LLM calls in tests using pytest mocker fixture

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

@bparees
Copy link
Contributor Author

bparees commented Feb 4, 2026

@asamal4 i think this is ready now

@asamal4 asamal4 merged commit f848cd7 into lightspeed-core:main Feb 5, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants