Skip to content

add additional fields to output for non-error scenarios#114

Merged
asamal4 merged 2 commits intolightspeed-core:mainfrom
asamal4:expand-csv-output
Dec 11, 2025
Merged

add additional fields to output for non-error scenarios#114
asamal4 merged 2 commits intolightspeed-core:mainfrom
asamal4:expand-csv-output

Conversation

@asamal4
Copy link
Collaborator

@asamal4 asamal4 commented Dec 6, 2025

Description

Added additional fields to output. This will help us to do quick comparison for non-error scenarios. For error scenarios the data is minimal as per the existing logic.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Added six evaluation fields for richer tool/expectation metadata (tool calls, contexts, expected response, expected intent, expected keywords, expected tool calls) and included them in CSV exports.
  • Changed Behavior

    • Replaced numeric token-count columns with new string-based metadata fields and adjusted CSV column ordering.
  • Tests

    • Updated unit tests to validate the new fields and CSV output for success and error cases.
  • Documentation

    • Updated CSV schema examples and configuration docs to reflect new fields and ordering.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 6, 2025

Walkthrough

Reorders CSV columns, replaces two numeric token fields on EvaluationResult with six optional string fields, serializes additional turn_data fields into JSON when building results, and updates tests and configuration to align with the new CSV schema and model fields.

Changes

Cohort / File(s) Summary
Core constants & model
src/lightspeed_evaluation/core/constants.py, src/lightspeed_evaluation/core/models/data.py
Reordered SUPPORTED_CSV_COLUMNS (moved execution_time, repositioned token/judge fields) and added CSV fields; EvaluationResult removed api_input_tokens/api_output_tokens and added optional str fields: tool_calls, contexts, expected_response, expected_intent, expected_keywords, expected_tool_calls.
Evaluator logic
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
Added _to_json_str(value) helper, imported json and Any, switched to local turn_data usage, and JSON-serialized query/response plus the six new turn_data fields into EvaluationResult instances.
Tests, config & docs
tests/unit/core/output/test_generator.py, tests/unit/pipeline/evaluation/test_evaluator.py, config/system.yaml, docs/configuration.md
Tests updated to assert the new fields and adjusted CSV expectations (generator test expects three rows); system.yaml and docs extended to list and demonstrate the expanded CSV/output schema including new fields and adjusted ordering.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–30 minutes

  • Pay attention to:
    • _to_json_str() handling of None/empty values and JSON escaping for CSV output.
    • Any remaining callers or serializers that expected api_input_tokens/api_output_tokens.
    • Consistency between SUPPORTED_CSV_COLUMNS ordering and CSV generator/export logic and tests.
    • Tests and exception paths ensuring optional fields are None or empty where appropriate.

Possibly related PRs

Suggested reviewers

  • VladimirKadlec
  • tisnik

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title 'add additional fields to output for non-error scenarios' directly and clearly reflects the main changes in the changeset, which involves adding new fields (tool_calls, contexts, expected_response, expected_intent, expected_keywords, expected_tool_calls) to the EvaluationResult model and supporting configurations for non-error evaluation scenarios.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2ac11be and 7aa1060.

📒 Files selected for processing (2)
  • config/system.yaml (1 hunks)
  • docs/configuration.md (2 hunks)
✅ Files skipped from review due to trivial changes (1)
  • docs/configuration.md
🧰 Additional context used
📓 Path-based instructions (2)
config/system.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Add new metrics metadata to config/system.yaml in the metrics_metadata section

Files:

  • config/system.yaml
config/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Add sample evaluation data YAML file when adding new features

Files:

  • config/system.yaml
🧠 Learnings (2)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-11-24T16:59:21.420Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T16:59:21.420Z
Learning: Applies to config/system.yaml : Add new metrics metadata to `config/system.yaml` in the metrics_metadata section

Applied to files:

  • config/system.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (3.12)
  • GitHub Check: tests (3.13)
  • GitHub Check: tests (3.11)
  • GitHub Check: mypy
🔇 Additional comments (2)
config/system.yaml (2)

49-151: Clarify whether new CSV fields require metrics metadata entries.

The new CSV columns include tool_calls, contexts, expected_response, expected_intent, expected_keywords, and expected_tool_calls (lines 174–183), but these do not appear as new entries in the metrics_metadata section. Per the coding guidelines, new metrics metadata should be added to this section. Confirm whether these are:

  1. Metrics that require threshold/description metadata entries, or
  2. Reference/context fields that don't require metrics metadata

If they are metrics, please add appropriate metadata entries.


163-183: CSV columns configuration verified: all fields properly aligned and serialized.

The CSV columns at lines 163-183 match the SUPPORTED_CSV_COLUMNS in constants.py exactly, and all 20 fields (including api_input_tokens, api_output_tokens, judge_llm_input_tokens, judge_llm_output_tokens, tool_calls, contexts, expected_response, expected_intent, expected_keywords, and expected_tool_calls) are properly defined in the EvaluationResult model and dynamically serialized by the CSV generator using attribute reflection. No action required.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

169-184: Consider populating additional fields in error results for debugging.

The _create_error_result method currently only populates query and response. For better debuggability, consider also populating contexts, expected_response, and other fields when available, so users can see what input data led to the error.

This is optional since error results are intentionally minimal, but having context can help diagnose issues.

     def _create_error_result(
         self, request: EvaluationRequest, reason: str, execution_time: float
     ) -> EvaluationResult:
         """Create an ERROR result for failed evaluation."""
+        turn_data = request.turn_data
         return EvaluationResult(
             conversation_group_id=request.conv_data.conversation_group_id,
             turn_id=request.turn_id,
             metric_identifier=request.metric_identifier,
             result="ERROR",
             score=None,
             threshold=None,
             reason=reason,
-            query=request.turn_data.query if request.turn_data else "",
-            response=request.turn_data.response or "" if request.turn_data else "",
+            query=turn_data.query if turn_data else "",
+            response=turn_data.response or "" if turn_data else "",
+            contexts=_to_json_str(turn_data.contexts) if turn_data else None,
+            expected_response=turn_data.expected_response if turn_data else None,
             execution_time=execution_time,
         )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba0bd33 and 7f3176a.

📒 Files selected for processing (5)
  • src/lightspeed_evaluation/core/constants.py (1 hunks)
  • src/lightspeed_evaluation/core/models/data.py (1 hunks)
  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (4 hunks)
  • tests/unit/core/output/test_generator.py (3 hunks)
  • tests/unit/pipeline/evaluation/test_evaluator.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use type hints for all public functions and methods
Use Google-style docstrings for all public APIs
Use custom exceptions from core.system.exceptions for error handling
Use structured logging with appropriate levels

Files:

  • src/lightspeed_evaluation/core/constants.py
  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

  • tests/unit/pipeline/evaluation/test_evaluator.py
  • tests/unit/core/output/test_generator.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T16:59:21.420Z
Learning: All new evaluation features should be added to `src/lightspeed_evaluation/` core framework
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-07-29T05:15:39.782Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 22
File: lsc_agent_eval/src/lsc_agent_eval/core/agent_goal_eval/evaluator.py:87-100
Timestamp: 2025-07-29T05:15:39.782Z
Learning: In the lsc_agent_eval framework, the substring evaluation logic in the `_evaluate_substring` method requires ALL expected keywords to be present in the agent response (logical AND), not just any keyword (logical OR). This is a stricter evaluation condition that was intentionally changed and may be subject to future modifications.

Applied to files:

  • tests/unit/pipeline/evaluation/test_evaluator.py
📚 Learning: 2025-09-19T12:32:06.403Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

  • tests/unit/pipeline/evaluation/test_evaluator.py
📚 Learning: 2025-11-24T16:59:21.420Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T16:59:21.420Z
Learning: All new evaluation features should be added to `src/lightspeed_evaluation/` core framework

Applied to files:

  • src/lightspeed_evaluation/core/models/data.py
  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-07-16T13:20:45.006Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:0-0
Timestamp: 2025-07-16T13:20:45.006Z
Learning: In the lsc_agent_eval package, evaluation results use distinct values: "FAIL" means the evaluation ran successfully but the result was negative, while "ERROR" means there was an issue executing the evaluation itself (e.g., setup script failed, API connection failed).

Applied to files:

  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-07-16T13:20:40.632Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:0-0
Timestamp: 2025-07-16T13:20:40.632Z
Learning: In the lsc_agent_eval package, evaluation results use "FAIL" for evaluations that ran but didn't pass the criteria, and "ERROR" for errors in the evaluation process itself (like setup script failures, API errors, etc.).

Applied to files:

  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-07-16T13:21:53.225Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:190-214
Timestamp: 2025-07-16T13:21:53.225Z
Learning: In the lsc_agent_eval framework, there's a distinction between FAIL and ERROR results:
- FAIL: evaluation failed (e.g., script verification failed, agent response doesn't match expected criteria) - result.error is None
- ERROR: error running eval (e.g., setup script failed, agent API error) - result.error contains error message

Applied to files:

  • tests/unit/core/output/test_generator.py
🧬 Code graph analysis (3)
tests/unit/pipeline/evaluation/test_evaluator.py (1)
src/lightspeed_evaluation/core/api/client.py (1)
  • query (71-105)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)
src/lightspeed_evaluation/core/api/client.py (1)
  • query (71-105)
tests/unit/core/output/test_generator.py (1)
src/lightspeed_evaluation/core/models/data.py (1)
  • EvaluationResult (370-428)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (3.12)
  • GitHub Check: tests (3.13)
  • GitHub Check: tests (3.11)
  • GitHub Check: mypy
🔇 Additional comments (8)
src/lightspeed_evaluation/core/constants.py (1)

35-40: LGTM!

The new CSV columns are correctly added and align with the new optional fields in EvaluationResult. The naming is consistent and the ordering (actual values followed by expected values) is logical.

tests/unit/pipeline/evaluation/test_evaluator.py (2)

156-158: LGTM!

The test correctly validates that the new fields (query, response, contexts) are populated from turn data in successful evaluation paths. The JSON serialization of contexts as '["Context"]' aligns with the _to_json_str helper behavior.


367-370: LGTM!

The test correctly expects contexts and expected_response to be None in the exception handling path, which aligns with _create_error_result not populating these optional fields.

tests/unit/core/output/test_generator.py (2)

328-331: LGTM!

The test correctly uses pre-serialized JSON strings for contexts and expected_keywords, matching the expected format after _to_json_str processing in the evaluator.


375-390: LGTM!

Comprehensive assertions validating CSV content with new fields. The test correctly expects:

  • JSON-serialized strings for contexts and expected_keywords
  • Empty string for contexts in the ERROR row (CSV representation of None)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (2)

26-33: LGTM!

The _to_json_str helper correctly handles serialization with appropriate fallbacks:

  • Returns None for None, empty lists, and empty dicts
  • Uses json.dumps with default=str for robust serialization
  • Falls back to str() on serialization errors

As per coding guidelines, the function has type hints for all parameters and return type.


138-159: LGTM!

The result construction correctly populates the new fields:

  • Extracts turn_data for cleaner repeated access
  • Applies _to_json_str to complex fields (tool_calls, contexts, expected_keywords, expected_tool_calls)
  • Preserves string fields (expected_response, expected_intent) directly
  • Handles missing turn_data gracefully with appropriate defaults
src/lightspeed_evaluation/core/models/data.py (1)

400-420: LGTM!

The new optional fields are well-defined with:

  • Appropriate types (Optional[str] for JSON-serialized data)
  • Clear descriptions documenting their purpose
  • Consistent default values (None)

The field additions align with the evaluator's usage and CSV column definitions in constants. As per coding guidelines, the model uses type hints appropriately.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

27-35: Consider edge case: nested objects with circular references.

The _to_json_str helper uses default=str as a fallback for non-serializable types, which is good. However, if value contains circular references or deeply nested objects, json.dumps could raise RecursionError (not caught) or produce unexpectedly large strings.

Consider adding a recursion depth limit or handling RecursionError:

 def _to_json_str(value: Any) -> Optional[str]:
     """Convert any value to JSON string. Returns None for empty values."""
     if value is None or value == [] or value == {}:
         return None
     try:
         return json.dumps(value, indent=None, default=str)
-    except (TypeError, ValueError):
+    except (TypeError, ValueError, RecursionError):
         return str(value)
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 032bd43 and 12f424a.

📒 Files selected for processing (5)
  • src/lightspeed_evaluation/core/constants.py (1 hunks)
  • src/lightspeed_evaluation/core/models/data.py (1 hunks)
  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (4 hunks)
  • tests/unit/core/output/test_generator.py (3 hunks)
  • tests/unit/pipeline/evaluation/test_evaluator.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

  • tests/unit/core/output/test_generator.py
  • tests/unit/pipeline/evaluation/test_evaluator.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Use type hints for all public functions and methods
Use Google-style docstrings for all public APIs
Use custom exceptions from core.system.exceptions for error handling
Use structured logging with appropriate levels

Files:

  • src/lightspeed_evaluation/core/models/data.py
  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
  • src/lightspeed_evaluation/core/constants.py
🧠 Learnings (7)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-07-16T13:20:45.006Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:0-0
Timestamp: 2025-07-16T13:20:45.006Z
Learning: In the lsc_agent_eval package, evaluation results use distinct values: "FAIL" means the evaluation ran successfully but the result was negative, while "ERROR" means there was an issue executing the evaluation itself (e.g., setup script failed, API connection failed).

Applied to files:

  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-07-16T13:20:40.632Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:0-0
Timestamp: 2025-07-16T13:20:40.632Z
Learning: In the lsc_agent_eval package, evaluation results use "FAIL" for evaluations that ran but didn't pass the criteria, and "ERROR" for errors in the evaluation process itself (like setup script failures, API errors, etc.).

Applied to files:

  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-07-16T13:21:53.225Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:190-214
Timestamp: 2025-07-16T13:21:53.225Z
Learning: In the lsc_agent_eval framework, there's a distinction between FAIL and ERROR results:
- FAIL: evaluation failed (e.g., script verification failed, agent response doesn't match expected criteria) - result.error is None
- ERROR: error running eval (e.g., setup script failed, agent API error) - result.error contains error message

Applied to files:

  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-07-28T14:26:03.119Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 22
File: lsc_agent_eval/src/lsc_agent_eval/core/agent_goal_eval/eval_data.py:146-153
Timestamp: 2025-07-28T14:26:03.119Z
Learning: In the lsc_agent_eval framework, evaluations are identified by a composite key of (conversation_group, eval_id). This design allows the same eval_id to exist across different conversation groups (logged as warning) but prevents duplicates within the same conversation group (validation error). This supports logical separation and reusable eval_ids across different conversation contexts.

Applied to files:

  • tests/unit/core/output/test_generator.py
📚 Learning: 2025-11-24T16:59:21.420Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T16:59:21.420Z
Learning: All new evaluation features should be added to `src/lightspeed_evaluation/` core framework

Applied to files:

  • src/lightspeed_evaluation/core/models/data.py
  • tests/unit/pipeline/evaluation/test_evaluator.py
📚 Learning: 2025-09-09T14:58:10.630Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/pipeline/evaluation/amender.py:32-41
Timestamp: 2025-09-09T14:58:10.630Z
Learning: In the lightspeed-evaluation framework, when API is enabled, every turn should make a fresh API call regardless of whether the turn already has response or tool_calls data. This ensures consistency and fresh responses for each evaluation run.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/evaluator.py
🧬 Code graph analysis (2)
tests/unit/core/output/test_generator.py (2)
src/lightspeed_evaluation/core/api/client.py (1)
  • query (72-106)
src/lightspeed_evaluation/core/models/data.py (1)
  • EvaluationResult (378-446)
tests/unit/pipeline/evaluation/test_evaluator.py (1)
src/lightspeed_evaluation/core/api/client.py (1)
  • query (72-106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: mypy
  • GitHub Check: tests (3.12)
  • GitHub Check: tests (3.13)
  • GitHub Check: tests (3.11)
🔇 Additional comments (8)
tests/unit/pipeline/evaluation/test_evaluator.py (2)

157-160: LGTM! Test validates new fields correctly.

The test appropriately verifies that turn-level evaluation results include the new query, response, and contexts fields populated from the input turn data. The contexts field is correctly expected as a JSON-serialized string.


368-372: LGTM! Exception path test correctly validates minimal field population.

The test appropriately verifies that when an exception occurs during evaluation, the result still carries the basic query and response from the turn data, while optional fields like contexts and expected_response are set to None. This aligns with the error handling behavior in the evaluator.

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (2)

165-195: LGTM! Result construction properly populates new fields.

The updated EvaluationResult construction correctly:

  • Uses a local turn_data variable for consistent access
  • Populates new fields (tool_calls, contexts, expected_response, expected_intent, expected_keywords, expected_tool_calls) using _to_json_str for serialization
  • Maintains null-safety with if turn_data guards
  • Retains existing token tracking fields

206-227: Verify field defaults in EvaluationResult model for optional fields.

The _create_error_result method omits new optional fields (tool_calls, contexts, expected_response, expected_intent, expected_keywords, expected_tool_calls), while the success path populates them. Check whether these fields have default=None in the EvaluationResult model definition. If they do, the current implementation is safe; if not, explicitly set them to None for consistency with test expectations (lines 368-372 in test_evaluator.py expect these as None).

tests/unit/core/output/test_generator.py (2)

331-373: LGTM! Test coverage expanded to validate new fields.

The test appropriately validates:

  1. Row 1: New fields query, response, contexts, and expected_keywords are populated and correctly serialized
  2. Row 2: The expected_response field is included
  3. Row 3: ERROR scenario with minimal fields (query populated, contexts empty)

This aligns with the PR objective to add additional fields for non-error scenarios while keeping error output minimal.


390-406: LGTM! Assertions validate CSV structure and new fields.

The assertions correctly verify:

  • Three rows in the CSV output
  • Row 1 contains the new fields with proper values
  • Row 2 includes expected_response
  • Row 3 (ERROR) has minimal data with empty contexts
src/lightspeed_evaluation/core/constants.py (1)

25-46: LGTM! CSV columns reordered and expanded logically.

The updated SUPPORTED_CSV_COLUMNS list:

  • Reorders execution_time to follow threshold (more logical grouping with evaluation metadata)
  • Places token-related fields together (api_input_tokens, api_output_tokens, judge_llm_input_tokens, judge_llm_output_tokens)
  • Adds six new fields at the end for turn data (tool_calls, contexts, expected_response, expected_intent, expected_keywords, expected_tool_calls)

The ordering is logical and groups related fields together.

src/lightspeed_evaluation/core/models/data.py (1)

410-438: Note: Inconsistency between AI summary and actual implementation.

The AI summary states that api_input_tokens and api_output_tokens were "replaced" with the new fields, but the actual code shows these fields are retained (lines 410-413) alongside the new optional string fields (lines 420-438). This is the correct approach, as token tracking is independent of the new turn data fields.

The field definitions are well-structured:

  • Existing token fields retained for backward compatibility
  • Six new optional string fields added with clear descriptions
  • All new fields default to None for optional population

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
config/system.yaml (1)

163-183: Clarify data field vs. metric distinction in documentation.

The new CSV columns include output data fields (tokens, contexts, expected values) that are distinct from metrics defined in metrics_metadata. Consider adding a brief comment in the YAML to clarify that csv_columns includes both evaluation metrics and raw data fields from turn execution, to aid future maintainers.

  # CSV columns to include (includes both metrics and raw turn execution data)
  csv_columns:
    # Identifiers
    - "conversation_group_id"
    - "turn_id"
    # Metric evaluation results
    - "metric_identifier"
    - "result"
    - "score"
    - "threshold"
    - "reason"
    # Turn execution metadata
    - "execution_time"
    # Turn input/output
    - "query"
    - "response"
    # API call token metrics
    - "api_input_tokens"
    - "api_output_tokens"
    - "judge_llm_input_tokens"
    - "judge_llm_output_tokens"
    # Tool and context data
    - "tool_calls"
    - "contexts"
    # Expected values for comparison
    - "expected_response"
    - "expected_intent"
    - "expected_keywords"
    - "expected_tool_calls"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 12f424a and 2ac11be.

📒 Files selected for processing (1)
  • config/system.yaml (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
config/system.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Add new metrics metadata to config/system.yaml in the metrics_metadata section

Files:

  • config/system.yaml
config/*.yaml

📄 CodeRabbit inference engine (AGENTS.md)

Add sample evaluation data YAML file when adding new features

Files:

  • config/system.yaml
🧠 Learnings (2)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 26
File: lsc_agent_eval/src/lsc_agent_eval/core/utils/api_client.py:124-151
Timestamp: 2025-08-22T09:16:29.070Z
Learning: In lsc_agent_eval project, the maintainer (asamal4) prefers reactive error handling - adding support for additional error response fields only when they occur in practice, rather than preemptively handling all possible error formats.
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-11-24T16:59:21.420Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-11-24T16:59:21.420Z
Learning: Applies to config/system.yaml : Add new metrics metadata to `config/system.yaml` in the metrics_metadata section

Applied to files:

  • config/system.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (3.11)
  • GitHub Check: mypy
  • GitHub Check: tests (3.12)
  • GitHub Check: tests (3.13)
🔇 Additional comments (1)
config/system.yaml (1)

163-183: Verify all new CSV columns are supported by the data model and output generator.

The CSV configuration adds 12 new columns including token counts, contexts, and expected values. Before merging, ensure these columns are:

  1. Defined in the data model (EvaluationResult or similar)
  2. Properly serialized in the output generator
  3. Consistently named across the codebase

Additionally, confirm that a sample evaluation data YAML file has been added per project guidelines.

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the description of the new output columns to the table in docs/configuration.md

Other than that LGTM.

@asamal4
Copy link
Collaborator Author

asamal4 commented Dec 11, 2025

Please add the description of the new output columns to the table in docs/configuration.md

@VladimirKadlec done. thank you !!

@asamal4 asamal4 merged commit 8688ff6 into lightspeed-core:main Dec 11, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants