Remove duplicate data validation in pipeline by asamal4 · Pull Request #141 · lightspeed-core/lightspeed-evaluation

asamal4 · 2026-01-15T01:24:15Z

Description

Remove duplicate data validation in pipeline. Data is already validated during load.

Type of change

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

Related Issue #
Closes #

Checklist before requesting a review

I have performed a self-review of my code.
PR has passed all pre-merge test jobs.
If it is a core feature, I have added thorough tests.

Testing

Please provide detailed steps to perform tests related to this code change.
How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Refactor
- Reorganized data validation architecture: validation methods are now internal-only, and validation has been removed from the evaluation pipeline flow. Validation logic and error handling semantics remain consistent.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-15T01:24:25Z

Walkthrough

The pull request renames the public validation method validate_evaluation_data to private _validate_evaluation_data in the DataValidator class and removes the explicit data validation orchestration from the evaluation pipeline. Validation is no longer a discrete pipeline step but remains available as an internal operation.

Changes

Cohort / File(s)	Summary
Validation API Update `src/lightspeed_evaluation/core/system/validator.py`	Method `validate_evaluation_data` renamed to `_validate_evaluation_data` to indicate private scope; docstring expanded and pylint disable marker added; validation logic preserved.
Pipeline Integration Changes `src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`	Removed DataValidator import and instance initialization; deleted explicit pre-validation step from `run_evaluation` flow; updated class docstring to remove "Validate data" responsibility; results assignment simplified.
Test Updates `tests/unit/core/system/test_validator.py`	All test calls updated from `validate_evaluation_data` to `_validate_evaluation_data`; no logic or error handling changes.
Pipeline Test Cleanup `tests/unit/pipeline/evaluation/test_pipeline.py`	Removed DataValidator mocking across multiple test cases; eliminated test branches exercising validation success/failure paths; tests now focus on MetricManager, APIClient, APIDataAmender, and other pipeline components without validation step coverage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Generic eval tool #28: Introduces new DataValidator implementation with public validate_evaluation_data method that operates on the same validation API surface being refactored in this PR.

Suggested reviewers

tisnik
Anxhela21

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: removing duplicate data validation from the pipeline as data is already validated during load.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d05ac65 and fc14bcc.

📒 Files selected for processing (4)

src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
tests/unit/core/system/test_validator.py
tests/unit/pipeline/evaluation/test_pipeline.py

💤 Files with no reviewable changes (1)

tests/unit/pipeline/evaluation/test_pipeline.py

🧰 Additional context used

📓 Path-based instructions (4)

**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead

Files:

src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
tests/unit/core/system/test_validator.py

src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Include type hints for all public functions and methods in Python
Use Google-style docstrings for all public APIs in Python

Files:

src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py

src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Use custom exceptions from core.system.exceptions module for error handling
Use structured logging with appropriate log levels in Python code

Files:

src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest mocking with mocker fixture instead of unittest.mock
Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock
Mirror test directory structure to match source code structure
Use test_.py naming convention for test files and test_ prefix for test functions
Use Test* prefix for test classes in Python test files
Mock LLM calls in tests using pytest mocker fixture

Files:

tests/unit/core/system/test_validator.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
tests/unit/core/system/test_validator.py

📚 Learning: 2026-01-06T10:26:30.274Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 127
File: src/lightspeed_evaluation/core/api/streaming_parser.py:122-149
Timestamp: 2026-01-06T10:26:30.274Z
Learning: DataValidationError in src/lightspeed_evaluation/core/system/exceptions.py is used exclusively for data file validation scenarios (YAML parsing, Pydantic model validation, file existence checks) in validator.py, not for runtime API response validation errors.

Applied to files:

src/lightspeed_evaluation/core/system/validator.py

📚 Learning: 2026-01-15T00:41:27.575Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 140
File: src/lightspeed_evaluation/runner/evaluation.py:41-41
Timestamp: 2026-01-15T00:41:27.575Z
Learning: In Python code reviews for the lightspeed-evaluation project, allow keeping the pylint disable: too-many-locals in functions that use lazy imports to group many local variables in one scope for readability. If a function uses lazy imports to justify this pattern, document why and avoid over-refactoring into multiple helpers just for lint reasons. If you can refactor without increasing complexity or number of globals, prefer refactoring; otherwise retain the pylint directive with a clear rationale.

Applied to files:

src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py

🧬 Code graph analysis (2)

src/lightspeed_evaluation/core/system/validator.py (1)

src/lightspeed_evaluation/core/models/data.py (1)

EvaluationData (318-386)

tests/unit/core/system/test_validator.py (1)

src/lightspeed_evaluation/core/system/validator.py (1)

_validate_evaluation_data (158-181)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: tests (3.13)
GitHub Check: tests (3.12)
GitHub Check: tests (3.11)

🔇 Additional comments (8)

src/lightspeed_evaluation/core/system/validator.py (3)

87-92: LGTM - Docstring and pylint disable are appropriate.

The pylint: disable=too-few-public-methods is justified since the class now exposes only load_evaluation_data() as its single public entry point. The updated docstring clearly communicates this design intent.

145-147: Good clarification of the validation phase.

The comment now explicitly distinguishes "Semantic validation" (metrics availability and requirements) from the structural YAML/Pydantic validation that happens earlier in the method.

158-159: No evidence of external callers for this method.

The search results show only one usage of validate_evaluation_data outside the method definition: an internal call within the class itself at line 146. DataValidator is instantiated in production code, but only load_evaluation_data() is called on it—never validate_evaluation_data(). The method was used exclusively within the class, so renaming it to explicitly mark it as private is not a breaking change.

Likely an incorrect or invalid review comment.

tests/unit/core/system/test_validator.py (2)

56-56: Acceptable pattern for testing internal validation logic.

Directly testing the private _validate_evaluation_data method is appropriate here since these unit tests need to verify the validation logic in isolation, separate from the file I/O operations in load_evaluation_data(). This pattern is consistent throughout the test file.

79-79: All remaining test method calls updated consistently.

The remaining changes across the test file are mechanical renames following the same pattern. Test logic and assertions are unchanged.

Also applies to: 103-103, 123-123, 141-141, 166-166, 185-185, 207-207, 234-234, 332-332, 349-349, 363-363, 392-392

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (3)

20-20: Import cleanup reflects the architectural change.

Removing the DataValidator import is correct since validation is no longer performed in the pipeline. The ConfigLoader import remains as it's still needed for configuration access.

32-40: Docstring accurately reflects the updated responsibilities.

The removal of "Validate data" from the responsibilities list correctly documents that validation is no longer this class's concern.

111-141: No action needed—all callers already use DataValidator.load_evaluation_data() for pre-validated data.

The production entry point in src/lightspeed_evaluation/runner/evaluation.py correctly obtains evaluation data via DataValidator.load_evaluation_data() before passing it to pipeline.run_evaluation(). No direct EvaluationData construction bypasses validation outside of the validator itself.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

VladimirKadlec

LGTM

Remove duplicate data validation in pipeline

fc14bcc

VladimirKadlec approved these changes Jan 15, 2026

View reviewed changes

asamal4 merged commit af3c576 into lightspeed-core:main Jan 16, 2026
15 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicate data validation in pipeline#141

Remove duplicate data validation in pipeline#141
asamal4 merged 1 commit intolightspeed-core:mainfrom
asamal4:remove-dup-data-val

asamal4 commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 15, 2026 •

edited

Loading

Uh oh!

VladimirKadlec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

asamal4 commented Jan 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of change

Tools used to create PR

Related Tickets & Documents

Checklist before requesting a review

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

VladimirKadlec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

asamal4 commented Jan 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 15, 2026 •

edited

Loading