chore: consolidate test_data models by asamal4 · Pull Request #131 · lightspeed-core/lightspeed-evaluation

asamal4 · 2026-01-07T14:41:09Z

Description

Consolidate test cases for data models

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: Cursor

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

New Features
- Added new evaluation data models for improved evaluation workflows.
- Enhanced tool-call expectations to support flexible multi-set formats with fallback handling.
Tests
- Expanded and consolidated test coverage with comprehensive validation scenarios for evaluation and turn data, plus updated validation messages and error handling.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-07T14:41:22Z

Walkthrough

Consolidates and expands unit tests for core data models: merges removed tests into a single module, adds Validation tests for new public models EvaluationData and EvaluationResult, and updates TurnData.expected_tool_calls validation to a flexible multi-set format with broader edge-case coverage.

Changes

Cohort / File(s)	Summary
Tests (modified & removed) `tests/unit/core/models/test_data.py`, `tests/unit/core/models/test_data_additional.py`	Consolidated tests into `test_data.py` (removed `test_data_additional.py`); expanded TurnData tests for minimal/empty fields and flexible multi-set `expected_tool_calls` shapes (single set, multiple sets, empty alternatives, complex sequences, None, regex-preserving args); updated validation message assertions and renamed/restructured several tests.
Public data models (new tests / exposure) `.../lightspeed_evaluation/core/models/data.py`	Introduces/exports `EvaluationData` and `EvaluationResult`; tests validate `EvaluationData` (non-empty `conversation_group_id`, non-empty `turns`, reject empty turns) and `EvaluationResult` (default values, constrained `status`, non-negative `execution_time`, allow `turn_id=None` for conversation-level metrics).

Sequence Diagram(s)

(omitted — changes are test and model validations without multi-component runtime control flow)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Ability to set alternate tool calls for eval #90 — Directly related: converts TurnData.expected_tool_calls to the multi-set (nested alternatives) format and adjusts validation/handling logic.
Turn metric override #55 — Related: changes to lightspeed_evaluation.core.models.data introducing/altering evaluation models and metric fields/validation.

Suggested reviewers

VladimirKadlec
tisnik

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'chore: consolidate test_data models' accurately describes the main change—consolidating test files (test_data_additional.py removed, tests merged into test_data.py) and expanding coverage for data models.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unit/core/models/test_data.py (1)
273-286: Consider adding a positive test case for EvaluationData.

The class only tests rejection scenarios. Adding a test for successful creation with valid data would improve coverage and serve as documentation for expected usage.
💡 Suggested addition
def test_valid_evaluation_data(self):
    """Test EvaluationData with valid fields."""
    turn = TurnData(turn_id="turn1", query="Query")
    data = EvaluationData(conversation_group_id="conv1", turns=[turn])

    assert data.conversation_group_id == "conv1"
    assert len(data.turns) == 1
    assert data.turns[0].turn_id == "turn1"

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a0778f5 and d6fefe1.

📒 Files selected for processing (2)

tests/unit/core/models/test_data.py
tests/unit/core/models/test_data_additional.py

💤 Files with no reviewable changes (1)

tests/unit/core/models/test_data_additional.py

🧰 Additional context used

📓 Path-based instructions (2)

**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest mocking (mocker fixture) instead of unittest.mock

Files:

tests/unit/core/models/test_data.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Mirror test structure to source code organization, with test files named test_.py and test functions/classes named test_ and Test*

Files:

tests/unit/core/models/test_data.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

📚 Learning: 2025-10-31T11:54:59.126Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 90
File: src/lightspeed_evaluation/core/models/data.py:198-208
Timestamp: 2025-10-31T11:54:59.126Z
Learning: In the lightspeed_evaluation framework, the expected_tool_calls validator intentionally rejects a single empty set `[[]]` as the only alternative. This is by design: if no tool calls are expected, the tool_eval metric should not be configured for that turn. Empty sets are only valid as fallback alternatives (e.g., `[[[tool_call]], [[]]]`), representing optional tool call scenarios, not as primary or sole expectations.

Applied to files:

tests/unit/core/models/test_data.py

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

tests/unit/core/models/test_data.py

📚 Learning: 2026-01-06T10:26:30.274Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 127
File: src/lightspeed_evaluation/core/api/streaming_parser.py:122-149
Timestamp: 2026-01-06T10:26:30.274Z
Learning: DataValidationError in src/lightspeed_evaluation/core/system/exceptions.py is used exclusively for data file validation scenarios (YAML parsing, Pydantic model validation, file existence checks) in validator.py, not for runtime API response validation errors.

Applied to files:

tests/unit/core/models/test_data.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: tests (3.13)

🔇 Additional comments (5)

tests/unit/core/models/test_data.py (5)

1-33: LGTM!

The imports are minimal and appropriate. TestTurnData covers the essential required field validations (minimal fields, empty turn_id, empty query) with correct use of pytest.raises.

36-186: LGTM!

Comprehensive coverage of expected_tool_calls validation scenarios. The tests correctly verify:

Format conversion (single set → multiple sets)

Empty alternatives as valid fallbacks (but not as sole/first alternative)

Rejection of invalid structures and redundant empty alternatives

This aligns with the retrieved learning that empty sets are only valid as fallback alternatives, not as primary expectations.

189-214: LGTM!

Format detection tests correctly verify empty list rejection and single set format detection logic.

216-271: LGTM!

Thorough validation coverage for expected_keywords including valid cases, None handling, and rejection of invalid inputs (non-list, empty inner lists, empty strings).

289-340: LGTM!

Good coverage for EvaluationResult including default values, status validation, execution time constraints, and conversation-level metrics with None turn_id. The tests appropriately verify the model's constraints.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/unit/core/models/test_data.py (2)
238-238: Consider adding a None check for consistency.

Other tests in this file (e.g., lines 52, 70-71, 87-88) assert is not None before accessing properties. Adding this check here would improve test clarity and provide a better error message if the assertion fails.
♻️ Suggested improvement
+        assert turn_data.expected_keywords is not None
         assert len(turn_data.expected_keywords) == 2
308-359: Good coverage of EvaluationResult validation.

Tests correctly verify:

Default values for optional fields

Constraint on result status values

Non-negative execution_time enforcement

None is valid for turn_id (conversation-level metrics)

Consider adding a parameterized test for all valid result status values (e.g., PASS, FAIL, ERROR, SKIPPED) to ensure they're all accepted. This would complement the invalid status rejection test.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6fefe1 and 8d7add5.

📒 Files selected for processing (2)

tests/unit/core/models/test_data.py
tests/unit/core/models/test_data_additional.py

💤 Files with no reviewable changes (1)

tests/unit/core/models/test_data_additional.py

🧰 Additional context used

📓 Path-based instructions (2)

**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest mocking (mocker fixture) instead of unittest.mock

Files:

tests/unit/core/models/test_data.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Mirror test structure to source code organization, with test files named test_.py and test functions/classes named test_ and Test*

Files:

tests/unit/core/models/test_data.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

📚 Learning: 2025-10-31T11:54:59.126Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 90
File: src/lightspeed_evaluation/core/models/data.py:198-208
Timestamp: 2025-10-31T11:54:59.126Z
Learning: In the lightspeed_evaluation framework, the expected_tool_calls validator intentionally rejects a single empty set `[[]]` as the only alternative. This is by design: if no tool calls are expected, the tool_eval metric should not be configured for that turn. Empty sets are only valid as fallback alternatives (e.g., `[[[tool_call]], [[]]]`), representing optional tool call scenarios, not as primary or sole expectations.

Applied to files:

tests/unit/core/models/test_data.py

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

tests/unit/core/models/test_data.py

📚 Learning: 2026-01-06T10:26:30.274Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 127
File: src/lightspeed_evaluation/core/api/streaming_parser.py:122-149
Timestamp: 2026-01-06T10:26:30.274Z
Learning: DataValidationError in src/lightspeed_evaluation/core/system/exceptions.py is used exclusively for data file validation scenarios (YAML parsing, Pydantic model validation, file existence checks) in validator.py, not for runtime API response validation errors.

Applied to files:

tests/unit/core/models/test_data.py

🧬 Code graph analysis (1)

tests/unit/core/models/test_data.py (1)

src/lightspeed_evaluation/core/models/data.py (1)

TurnData (36-315)

🔇 Additional comments (5)

tests/unit/core/models/test_data.py (5)

1-11: LGTM!

Imports are clean and correctly reference the three models under test. The consolidation of EvaluationData and EvaluationResult alongside TurnData aligns with the PR objective.

13-33: LGTM!

Good coverage of TurnData required field validation. The tests correctly verify min_length=1 constraints on turn_id and query, and confirm optional fields default to None.

36-186: Comprehensive validation coverage.

Tests thoroughly exercise the expected_tool_calls validation logic including:

Format conversion (single → multiple sets)

Empty alternatives as fallback (correctly aligned with the design that empty sets are valid only as fallback alternatives, not as primary or sole expectations)

Rejection of invalid structures, empty sequences, and redundant empty alternatives

The error message assertions (match=...) correctly verify the specific validation paths are being triggered.

189-213: LGTM!

Format detection tests correctly validate that:

Empty list [] is rejected as "the only alternative"

Single-set format is converted to one alternative containing multiple sequences

This aligns with the framework's intentional design where empty sets are only valid as fallback alternatives. Based on learnings, the validation correctly rejects [[]] as the sole expectation.

273-305: Good foundational coverage for EvaluationData.

Tests correctly validate required field constraints (non-empty conversation_group_id, non-empty turns list). The test structure is clear and follows the established patterns.

VladimirKadlec

LGTM

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

chore: consolidate test_data models

8d7add5

asamal4 force-pushed the consolidate-testdatamodel branch from d6fefe1 to 8d7add5 Compare January 7, 2026 14:53

coderabbitai bot reviewed Jan 7, 2026

View reviewed changes

VladimirKadlec approved these changes Jan 8, 2026

View reviewed changes

asamal4 merged commit bb643b0 into lightspeed-core:main Jan 8, 2026
15 checks passed

asamal4 mentioned this pull request Jan 8, 2026

Add optional property tag to group eval conversations #134

Merged

12 tasks

coderabbitai bot mentioned this pull request Jan 14, 2026

Leads 6 set expected responses #138

Merged

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: consolidate test_data models#131

chore: consolidate test_data models#131
asamal4 merged 1 commit intolightspeed-core:mainfrom
asamal4:consolidate-testdatamodel

asamal4 commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 7, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

VladimirKadlec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asamal4 commented Jan 7, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asamal4 commented Jan 7, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 7, 2026 •

edited

Loading