Parse new LCORE tool_call format by bparees · Pull Request #150 · lightspeed-core/lightspeed-evaluation

bparees · 2026-01-30T20:50:49Z

Description

Adds support for the current LCORE tool_call response format, e.g:

data: {"event": "tool_call", "data": {"id": "mcp_list_cb6c7811-b2de-4ab7-9109-63a3fac1a5ee", "name": "mcp_list_tools", "args": {"server_label": "obs"}, "type": "mcp_list_tools"}}

data: {"event": "tool_result", "data": {"id": "mcp_list_cb6c7811-b2de-4ab7-9109-63a3fac1a5ee", "status": "success", "content": "{\"server_label\": \"obs\", \"tools\": [{\"name\": \"execute_range_query\", \"description\": \"Execute a PromQL range query with flexible time specification.\\n\\nFor current time data queries, use only the 'duration' parameter to specify how far back\\nto look from now (e.g., '1h' for last hour, '30m' for last 30 minutes). In that case\\nSET 'end' to 'NOW' and leave 'start' empty.\\n\\nFor historical data queries, use explicit 'start' and 'end' times.\\n\", \"input_schema\": {\"type\": \"object\", \"properties\": {\"duration\": {\"description\": \"Duration to look back from now (e.g., '1h', '30m', '1d', '2w') (optional)\", \"pattern\": \"^\\\\d+[smhdwy]$\", \"type\": \"string\"}, \"end\": {\"description\": \"End time as RFC3339 or Unix timestamp (optional). Use `NOW` for current time.\", \"type\": \"string\"}, \"query\": {\"description\": \"PromQL query string\", \"type\": \"string\"}, \"start\": {\"description\": \"Start time as RFC3339 or Unix timestamp (optional)\", \"type\": \"string\"}, \"step\": {\"description\": \"Query resolution step width (e.g., '15s', '1m', '1h')\", \"pattern\": \"^\\\\d+[smhdwy]$\", \"type\": \"string\"}}, \"required\": [\"query\", \"step\"]}}, {\"name\": \"list_metrics\", \"description\": \"List all available metrics in Prometheus\", \"input_schema\": {\"type\": \"object\", \"properties\": {}}}]}", "type": "mcp_list_tools", "round": 1}}

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Verified by running evals against a current LCORE deployment (v0.4+)

Summary by CodeRabbit

New Features
- Streaming now accepts explicit tool-result events, maps tool calls by ID, and links results back to their originating call; supports both legacy and new tool-call formats so arguments and results propagate correctly.
Bug Fixes
- Improved parsing robustness and error logging for mixed-format tool-call and result events.
Tests
- Unit tests expanded to cover new-format tool calls, result association across multiple calls, parsing variations, and performance metric preservation.

coderabbitai · 2026-01-30T20:51:07Z

Walkthrough

The streaming parser adds support for tool_result events, stores tool calls by ID via a new tool_calls_by_id field, and normalizes both legacy (tool_name/arguments) and new (name/args) tool-call formats. Results are applied to stored calls when an ID is present.

Changes

Cohort / File(s)	Summary
Streaming Parser Enhancement `src/lightspeed_evaluation/core/api/streaming_parser.py`	Added `tool_result` to `CONTENT_EVENTS`. Introduced `tool_calls_by_id` on `StreamingContext`. `_process_event` normalizes legacy/new tool-call formats, appends parsed calls to `tool_calls`, stores calls by `id` when present, and routes `tool_result` events to `_process_tool_result`. `_parse_tool_call` now accepts generic `event_data` and returns `{tool_name, arguments, result?}`. Added `_process_tool_result` to attach results to stored tool calls with validation and logging.
Parser Tests `tests/unit/core/api/test_streaming_parser.py`	Replaced `Any` with `Mock` in test signatures and imports, converted legacy embedded tool_call fixtures to inline JSON, and added/updated tests for new-format tool calls (`id`, `name`, `args`), `tool_result` handling, multiple-call aggregation, and result association to call IDs. Adjusted expectations and performance-metrics assertions.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Client as Client (SSE producer)
participant Parser as StreamingParser
participant Context as StreamingContext
participant Store as tool_calls_by_id (store)

rect rgba(200,200,255,0.5)
Client->>Parser: send `tool_call` (new or legacy)
Parser->>Parser: _process_event -> _parse_tool_call
Parser->>Context: append tool_call to `tool_calls`
alt id present
    Parser->>Store: store tool_call by id
end
end

rect rgba(200,255,200,0.5)
Client->>Parser: send `tool_result` (with id)
Parser->>Parser: _process_event -> _process_tool_result
Parser->>Store: lookup tool_call by id
alt found
    Parser->>Context: attach result to stored tool_call
else not found
    Parser->>Parser: log missing id / ignore
end
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly and specifically describes the main change: adding support for parsing the new LCORE tool_call format with both tool_call and tool_result events.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

bparees · 2026-01-30T20:51:17Z

@asamal4 w/o this change i was not getting tool_call data in the eval results when using a current LCORE deployment

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/lightspeed_evaluation/core/api/streaming_parser.py`:
- Around line 206-219: The parser incorrectly treats an empty new-format args
dict as missing because it uses "or" when assigning tool_name and arguments;
change the logic to prefer the "name"/"args" keys when present by checking key
presence on event_data (e.g., use "if 'name' in event_data: tool_name =
event_data['name'] else: tool_name = event_data.get('tool_name')" and similarly
for "args"/"arguments") and then treat arguments as missing only if neither key
is present (i.e., arguments is None and neither "args" nor "arguments" keys
exist). Update the code paths around event_data, tool_name, and arguments in
streaming_parser.py so empty dicts for args are accepted as valid.

src/lightspeed_evaluation/core/api/streaming_parser.py

asamal4

Thank you @bparees
Not sure when this got changed, probably to make it compatible with OLS..
We had this logic only for query endpoint to make it compatible with existing OLS

src/lightspeed_evaluation/core/api/streaming_parser.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/unit/core/api/test_streaming_parser.py (1)
3-22: ⚠️ Potential issue | 🟠 Major

Use pytest-mock types instead of unittest.mock.
Line 4 imports Mock from unittest.mock, which violates the tests/**/*.py guideline. Replace with Any for type hints and remove the unittest.mock import entirely.
🔧 Suggested changes
-from typing import Generator
-from unittest.mock import Mock
+from typing import Any, Generator
...
-@pytest.fixture
-def mock_response(mocker: MockerFixture) -> Generator[Mock, None, None]:
+@pytest.fixture
+def mock_response(mocker: MockerFixture) -> Generator[Any, None, None]:
Update all test function parameters from mock_response: Mock to mock_response: Any (lines 27, 45, 61, 71, 81, 91, 107, 122, 138, 157, 319, 333, 349, 367, 381, 396, 411).
As per coding guidelines: tests/**/*.py: Use pytest mocking with mocker fixture instead of unittest.mock; Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

tests/unit/core/api/test_streaming_parser.py (2)
3-22: ⚠️ Potential issue | 🟠 Major

Remove the local mock_response fixture and drop the unittest.mock import to comply with testing guidelines.

The local fixture shadows the shared one from tests/unit/core/api/conftest.py and the unittest.mock.Mock import violates the requirement to use pytest mocking only. Update type annotations from Mock to Any and reuse the shared fixture.
Proposed changes
-from typing import Generator
-from unittest.mock import Mock
+from typing import Any

-import pytest
-from pytest_mock import MockerFixture
+import pytest

 from lightspeed_evaluation.core.api.streaming_parser import (
     parse_streaming_response,
     _parse_sse_line,
     _parse_tool_call,
     _format_tool_sequences,
 )

-
-@pytest.fixture
-def mock_response(mocker: MockerFixture) -> Generator[Mock, None, None]:
-    """Create a mock streaming response."""
-    response = mocker.Mock()
-    yield response
-

 class TestParseStreamingResponse:
-    def test_parse_complete_response(self, mock_response: Mock) -> None:
+    def test_parse_complete_response(self, mock_response: Any) -> None:
274-280: ⚠️ Potential issue | 🟠 Major

Add Any type annotation and import to the invalid-structure test.

The test passes a string to _parse_tool_call() which expects dict[str, Any]. Annotate token as Any to make the call type-safe, and add Any to the typing imports.
🛠️ Proposed fix
-from typing import Generator
+from typing import Any, Generator
-        token = "not a dict"
+        token: Any = "not a dict"

🤖 Fix all issues with AI agents

In `@tests/unit/core/api/test_streaming_parser.py`:
- Around line 45-50: The long JSON string literals in the test cases (e.g., the
lines list inside test_parse_response_with_tool_calls) are exceeding line-length
limits; update those literals by splitting them using implicit string
concatenation or building the JSON via helper variables (e.g., split the 'data:
{"event": ...}' strings across multiple adjacent quoted segments or construct
the dict then json.dumps it) to keep each source line under the linter limit;
apply the same fix to the other affected test blocks referenced (around lines
122-127, 138-167, 396-401, 411-417) so all long 'data: {...}' literals are
wrapped/split similarly.
- Around line 210-211: The test unpacks result into event and data but never
uses data, triggering Pylint W0612; change the unpack to use a throwaway name
(e.g., event, _ = result) or explicitly assert something about data if intended
to be checked, updating the line that currently reads "event, data = result" and
keeping the existing assertion "assert event == \"\"" and referencing the
variables result, event, and data in the test function.

tests/unit/core/api/test_streaming_parser.py

coderabbitai · 2026-02-04T21:18:10Z

tests/unit/core/api/test_streaming_parser.py

+        event, data = result
        assert event == ""  # Default empty string


⚠️ Potential issue | 🟡 Minor

Avoid unused data binding.

Pylint W0612 — use _ or assert on the value.

🛠️ Proposed fix

- event, data = result + event, _ = result

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

event, data = result

assert event == "" # Default empty string

event, _ = result

assert event == "" # Default empty string

🧰 Tools

🪛 GitHub Actions: Python linter

[error] 210-210: W0612: Unused variable 'data' (unused-variable)

🤖 Prompt for AI Agents

In `@tests/unit/core/api/test_streaming_parser.py` around lines 210 - 211, The test unpacks result into event and data but never uses data, triggering Pylint W0612; change the unpack to use a throwaway name (e.g., event, _ = result) or explicitly assert something about data if intended to be checked, updating the line that currently reads "event, data = result" and keeping the existing assertion "assert event == \"\"" and referencing the variables result, event, and data in the test function.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@tests/unit/core/api/test_streaming_parser.py`:
- Around line 3-5: Remove the import "from unittest.mock import Mock" and
replace every type annotation that uses Mock in the test functions with
typing.Any (e.g., change parameters annotated as Mock to Any) so the pytest-mock
"mocker" fixture is used without referencing unittest.mock; update all affected
test function signatures (the ones noted in the review) to use Any and ensure
"from typing import Any" remains as the only import for typing.

coderabbitai · 2026-02-04T21:37:47Z

tests/unit/core/api/test_streaming_parser.py

 from typing import Any
+from unittest.mock import Mock
+


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n tests/unit/core/api/test_streaming_parser.py | head -30

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 1212

🏁 Script executed:

grep -n "def test_" tests/unit/core/api/test_streaming_parser.py | head -20

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 1501

🏁 Script executed:

find tests -name "conftest.py" -exec cat -n {} \;

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 27967

🏁 Script executed:

sed -n '133,137p' tests/unit/core/api/test_streaming_parser.py sed -n '156,160p' tests/unit/core/api/test_streaming_parser.py

Repository: lightspeed-core/lightspeed-evaluation

Length of output: 495

Remove unittest.mock import; use pytest-mock fixture typing instead.

The test file imports Mock from unittest.mock at line 4, but tests use the mocker fixture (which creates mocks via pytest-mock). Replace all Mock type hints with Any and remove the unittest.mock import.

🔧 Example adjustment (apply to all test signatures)

-from unittest.mock import Mock +from typing import Any ... - def test_parse_complete_response(self, mock_response: Mock) -> None: + def test_parse_complete_response(self, mock_response: Any) -> None:

As per coding guidelines: Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock.

Applies to all affected test methods: lines 19, 37, 54, 64, 74, 84, 100, 115, 133, 156.

🤖 Prompt for AI Agents

In `@tests/unit/core/api/test_streaming_parser.py` around lines 3 - 5, Remove the import "from unittest.mock import Mock" and replace every type annotation that uses Mock in the test functions with typing.Any (e.g., change parameters annotated as Mock to Any) so the pytest-mock "mocker" fixture is used without referencing unittest.mock; update all affected test function signatures (the ones noted in the review) to use Any and ensure "from typing import Any" remains as the only import for typing.

will fix this separately.

@asamal4, sounds good! Would you like me to create a follow-up issue to track this refactoring task?

🧠 Learnings used

Learnt from: CR Repo: lightspeed-core/lightspeed-evaluation PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-12T11:01:11.398Z Learning: Applies to tests/**/*.py : Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock

Learnt from: CR Repo: lightspeed-core/lightspeed-evaluation PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-12T11:01:11.398Z Learning: Applies to tests/**/*.py : Use pytest mocking with mocker fixture instead of unittest.mock

Learnt from: CR Repo: lightspeed-core/lightspeed-evaluation PR: 0 File: AGENTS.md:0-0 Timestamp: 2026-01-12T11:01:11.398Z Learning: Applies to tests/**/*.py : Mock LLM calls in tests using pytest mocker fixture

Learnt from: asamal4 Repo: lightspeed-core/lightspeed-evaluation PR: 47 File: src/lightspeed_evaluation/core/output/generator.py:140-145 Timestamp: 2025-09-11T12:47:06.747Z Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

bparees · 2026-02-04T22:12:34Z

@asamal4 i think this is ready now

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

src/lightspeed_evaluation/core/api/streaming_parser.py Show resolved Hide resolved

asamal4 reviewed Feb 2, 2026

View reviewed changes

src/lightspeed_evaluation/core/api/streaming_parser.py Outdated Show resolved Hide resolved

bparees force-pushed the tool_parser branch from b2dce58 to 1253fa0 Compare February 3, 2026 16:27

coderabbitai bot reviewed Feb 3, 2026

View reviewed changes

bparees force-pushed the tool_parser branch from 1253fa0 to 4470ed6 Compare February 4, 2026 21:05

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

bparees force-pushed the tool_parser branch from 4470ed6 to cfac4b5 Compare February 4, 2026 21:33

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

Parse new LCORE tool_call format

c5680d1

bparees force-pushed the tool_parser branch from cfac4b5 to c5680d1 Compare February 4, 2026 22:05

asamal4 merged commit f848cd7 into lightspeed-core:main Feb 5, 2026
15 checks passed

asamal4 mentioned this pull request Feb 5, 2026

remove unittest mock from streaming parser tests #154

Open

15 tasks

		event, data = result
		assert event == "" # Default empty string

Conversation

bparees commented Jan 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

bparees commented Jan 30, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asamal4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

asamal4 Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

bparees commented Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bparees commented Jan 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

asamal4 left a comment •

edited

Loading

coderabbitai bot Feb 4, 2026 •

edited

Loading