Skip to content

Remove duplicate data validation in pipeline#141

Merged
asamal4 merged 1 commit intolightspeed-core:mainfrom
asamal4:remove-dup-data-val
Jan 16, 2026
Merged

Remove duplicate data validation in pipeline#141
asamal4 merged 1 commit intolightspeed-core:mainfrom
asamal4:remove-dup-data-val

Conversation

@asamal4
Copy link
Collaborator

@asamal4 asamal4 commented Jan 15, 2026

Description

Remove duplicate data validation in pipeline. Data is already validated during load.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: (e.g., Claude, CodeRabbit, Ollama, etc., N/A if not used)
  • Generated by: (e.g., tool name and version; N/A if not used)

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • Refactor
    • Reorganized data validation architecture: validation methods are now internal-only, and validation has been removed from the evaluation pipeline flow. Validation logic and error handling semantics remain consistent.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 15, 2026

Walkthrough

The pull request renames the public validation method validate_evaluation_data to private _validate_evaluation_data in the DataValidator class and removes the explicit data validation orchestration from the evaluation pipeline. Validation is no longer a discrete pipeline step but remains available as an internal operation.

Changes

Cohort / File(s) Summary
Validation API Update
src/lightspeed_evaluation/core/system/validator.py
Method validate_evaluation_data renamed to _validate_evaluation_data to indicate private scope; docstring expanded and pylint disable marker added; validation logic preserved.
Pipeline Integration Changes
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
Removed DataValidator import and instance initialization; deleted explicit pre-validation step from run_evaluation flow; updated class docstring to remove "Validate data" responsibility; results assignment simplified.
Test Updates
tests/unit/core/system/test_validator.py
All test calls updated from validate_evaluation_data to _validate_evaluation_data; no logic or error handling changes.
Pipeline Test Cleanup
tests/unit/pipeline/evaluation/test_pipeline.py
Removed DataValidator mocking across multiple test cases; eliminated test branches exercising validation success/failure paths; tests now focus on MetricManager, APIClient, APIDataAmender, and other pipeline components without validation step coverage.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • Generic eval tool #28: Introduces new DataValidator implementation with public validate_evaluation_data method that operates on the same validation API surface being refactored in this PR.

Suggested reviewers

  • tisnik
  • Anxhela21
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing duplicate data validation from the pipeline as data is already validated during load.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d05ac65 and fc14bcc.

📒 Files selected for processing (4)
  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • tests/unit/core/system/test_validator.py
  • tests/unit/pipeline/evaluation/test_pipeline.py
💤 Files with no reviewable changes (1)
  • tests/unit/pipeline/evaluation/test_pipeline.py
🧰 Additional context used
📓 Path-based instructions (4)
**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Do not disable lint warnings with # noqa, # type: ignore, or # pylint: disable comments - fix the underlying issue instead

Files:

  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • tests/unit/core/system/test_validator.py
src/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.py: Include type hints for all public functions and methods in Python
Use Google-style docstrings for all public APIs in Python

Files:

  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Use custom exceptions from core.system.exceptions module for error handling
Use structured logging with appropriate log levels in Python code

Files:

  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest mocking with mocker fixture instead of unittest.mock
Import mocking utilities from pytest (mocker fixture) rather than from unittest.mock
Mirror test directory structure to match source code structure
Use test_.py naming convention for test files and test_ prefix for test functions
Use Test* prefix for test classes in Python test files
Mock LLM calls in tests using pytest mocker fixture

Files:

  • tests/unit/core/system/test_validator.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-09-18T23:59:37.026Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • tests/unit/core/system/test_validator.py
📚 Learning: 2026-01-06T10:26:30.274Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 127
File: src/lightspeed_evaluation/core/api/streaming_parser.py:122-149
Timestamp: 2026-01-06T10:26:30.274Z
Learning: DataValidationError in src/lightspeed_evaluation/core/system/exceptions.py is used exclusively for data file validation scenarios (YAML parsing, Pydantic model validation, file existence checks) in validator.py, not for runtime API response validation errors.

Applied to files:

  • src/lightspeed_evaluation/core/system/validator.py
📚 Learning: 2026-01-15T00:41:27.575Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 140
File: src/lightspeed_evaluation/runner/evaluation.py:41-41
Timestamp: 2026-01-15T00:41:27.575Z
Learning: In Python code reviews for the lightspeed-evaluation project, allow keeping the pylint disable: too-many-locals in functions that use lazy imports to group many local variables in one scope for readability. If a function uses lazy imports to justify this pattern, document why and avoid over-refactoring into multiple helpers just for lint reasons. If you can refactor without increasing complexity or number of globals, prefer refactoring; otherwise retain the pylint directive with a clear rationale.

Applied to files:

  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
🧬 Code graph analysis (2)
src/lightspeed_evaluation/core/system/validator.py (1)
src/lightspeed_evaluation/core/models/data.py (1)
  • EvaluationData (318-386)
tests/unit/core/system/test_validator.py (1)
src/lightspeed_evaluation/core/system/validator.py (1)
  • _validate_evaluation_data (158-181)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: tests (3.13)
  • GitHub Check: tests (3.12)
  • GitHub Check: tests (3.11)
🔇 Additional comments (8)
src/lightspeed_evaluation/core/system/validator.py (3)

87-92: LGTM - Docstring and pylint disable are appropriate.

The pylint: disable=too-few-public-methods is justified since the class now exposes only load_evaluation_data() as its single public entry point. The updated docstring clearly communicates this design intent.


145-147: Good clarification of the validation phase.

The comment now explicitly distinguishes "Semantic validation" (metrics availability and requirements) from the structural YAML/Pydantic validation that happens earlier in the method.


158-159: No evidence of external callers for this method.

The search results show only one usage of validate_evaluation_data outside the method definition: an internal call within the class itself at line 146. DataValidator is instantiated in production code, but only load_evaluation_data() is called on it—never validate_evaluation_data(). The method was used exclusively within the class, so renaming it to explicitly mark it as private is not a breaking change.

Likely an incorrect or invalid review comment.

tests/unit/core/system/test_validator.py (2)

56-56: Acceptable pattern for testing internal validation logic.

Directly testing the private _validate_evaluation_data method is appropriate here since these unit tests need to verify the validation logic in isolation, separate from the file I/O operations in load_evaluation_data(). This pattern is consistent throughout the test file.


79-79: All remaining test method calls updated consistently.

The remaining changes across the test file are mechanical renames following the same pattern. Test logic and assertions are unchanged.

Also applies to: 103-103, 123-123, 141-141, 166-166, 185-185, 207-207, 234-234, 332-332, 349-349, 363-363, 392-392

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (3)

20-20: Import cleanup reflects the architectural change.

Removing the DataValidator import is correct since validation is no longer performed in the pipeline. The ConfigLoader import remains as it's still needed for configuration access.


32-40: Docstring accurately reflects the updated responsibilities.

The removal of "Validate data" from the responsibilities list correctly documents that validation is no longer this class's concern.


111-141: No action needed—all callers already use DataValidator.load_evaluation_data() for pre-validated data.

The production entry point in src/lightspeed_evaluation/runner/evaluation.py correctly obtains evaluation data via DataValidator.load_evaluation_data() before passing it to pipeline.run_evaluation(). No direct EvaluationData construction bypasses validation outside of the validator itself.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@asamal4 asamal4 merged commit af3c576 into lightspeed-core:main Jan 16, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants