add support for fail_on_invalid_data option by VladimirKadlec · Pull Request #94 · lightspeed-core/lightspeed-evaluation

VladimirKadlec · 2025-11-07T14:34:17Z

Summary by CodeRabbit

New Features
- New configuration flag core.fail_on_invalid_data (default: true).
- Per-turn expected-keywords validation and per-turn / per-conversation invalid-metric tracking.
Bug Fixes
- Metrics evaluation now yields explicit ERROR results and skips evaluation for invalid metrics, with logging.
Tests
- Added comprehensive unit tests for turn/conversation evaluation, invalid-metric handling, ordering, and request construction.
Documentation
- README updated to document the new config flag.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-07T14:34:36Z

Walkthrough

Adds config flag core.fail_on_invalid_data; augments data models with per-turn and per-conversation invalid-metric tracking and an expected_keywords validator; updates the DataValidator to respect the new flag and record invalid metrics; evaluation pipeline and processor skip metrics marked invalid; new unit tests for turn evaluation behavior.

Changes

Cohort / File(s)	Summary
Configuration & System Config `config/system.yaml`, `src/lightspeed_evaluation/core/models/system.py`, `README.md`	Added boolean `core.fail_on_invalid_data` (default: true) to config and `CoreConfig`; README updated.
Data Model Enhancements `src/lightspeed_evaluation/core/models/data.py`	Added `TurnData.expected_keywords: Optional[list[list[str]]]` with validator; added `_invalid_metrics: set[str]` plus `add_invalid_metric(metric: str)` and `is_metric_invalid(metric: str)` on `TurnData` and `EvaluationData`.
Validation Logic `src/lightspeed_evaluation/core/system/validator.py`	`DataValidator.__init__` gains `fail_on_invalid_data: bool`; validator records unknown/invalid metrics on turns/conversations and only returns failure when flag is true, otherwise logs and continues.
Pipeline & Runner Integration `src/lightspeed_evaluation/pipeline/evaluation/pipeline.py`, `src/lightspeed_evaluation/runner/evaluation.py`	Pass `config.core.fail_on_invalid_data` into `DataValidator` during initialization.
Evaluation Processor `src/lightspeed_evaluation/pipeline/evaluation/processor.py`	Early checks in `_evaluate_turn` and `_evaluate_conversation` produce `ERROR` results and skip evaluation for metrics flagged invalid by `is_metric_invalid()`.
Tests `tests/unit/pipeline/evaluation/test_processor.py`	New unit tests for ConversationProcessor._evaluate_turn covering metric ordering, valid/invalid/mixed metrics, EvaluationRequest construction, evaluator None behavior, turn index handling, and invalid-metric state behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Config as Config
    participant Validator as DataValidator
    participant Data as Turn/Eval Data
    participant Processor as ConversationProcessor
    participant Evaluator as MetricsEvaluator

    Config->>Validator: __init__(api_enabled, fail_on_invalid_data)
    Validator->>Validator: validate(data)

    alt fail_on_invalid_data = true
        Validator-->>Config: return False on validation error
        Note over Validator: validation fails (hard) color:`#f8d7da`
    else fail_on_invalid_data = false
        Validator->>Data: add_invalid_metric(metric)
        Validator->>Validator: log notice, continue
        Note over Validator: metric flagged but validation continues color:`#fff3cd`
    end

    Processor->>Data: is_metric_invalid(metric)?
    alt metric invalid
        Processor->>Processor: log error, emit ERROR result, skip metric color:`#f8d7da`
    else metric valid
        Processor->>Evaluator: evaluate(EvaluationRequest)
        Evaluator-->>Processor: result / None
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Review focus:
- DataValidator branching for fail_on_invalid_data (logging vs. hard-fail) and related test coverage.
- Correct initialization and mutation semantics of _invalid_metrics on TurnData and EvaluationData (thread-safety / shared-set pitfalls).
- Processor changes to ensure skipped metrics produce ERROR results and preserve metric ordering.
- New unit tests correctness and mock/setup hygiene.

Possibly related PRs

Add keyword eval metric #93 — overlaps on TurnData.expected_keywords and related validation/metric usage.
fix: run turn evaluation immediately after api call #105 — overlaps on processor changes and invalid-metric/error-handling behavior.
Turn metric override #55 — overlaps on per-turn metric validation and invalid-metric tracking in data models.

Suggested reviewers

asamal4
tisnik
lpiwowar

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'add support for fail_on_invalid_data option' directly and accurately describes the main change: adding a new configuration option to control validation failure behavior.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

src/lightspeed_evaluation/pipeline/evaluation/processor.py

src/lightspeed_evaluation/core/system/validator.py

lpiwowar · 2025-11-11T08:44:00Z

Thanks @VladimirKadlec for working on this 👍

I agree with comments from Asutosh.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/lightspeed_evaluation/core/models/system.py (1)
265-268: Improve field description for clarity.

The description has a minor grammatical issue (missing comma) and could be more explicit about the behavior in both states.

Consider this improved description:
     fail_on_invalid_data: bool = Field(
         default=True,
-        description="If False don't fail on invalid conversations",
+        description="When True, raise errors on validation failures; when False, log warnings and continue processing",
     )

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb05b7 and 059c7c9.

📒 Files selected for processing (9)

README.md (1 hunks)
config/system.yaml (2 hunks)
src/lightspeed_evaluation/core/models/data.py (2 hunks)
src/lightspeed_evaluation/core/models/system.py (1 hunks)
src/lightspeed_evaluation/core/system/validator.py (6 hunks)
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1 hunks)
src/lightspeed_evaluation/pipeline/evaluation/processor.py (2 hunks)
src/lightspeed_evaluation/runner/evaluation.py (1 hunks)
tests/unit/pipeline/evaluation/test_processor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

README.md
config/system.yaml
src/lightspeed_evaluation/runner/evaluation.py
src/lightspeed_evaluation/core/system/validator.py
src/lightspeed_evaluation/core/models/data.py

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
src/lightspeed_evaluation/core/models/system.py
src/lightspeed_evaluation/pipeline/evaluation/processor.py
tests/unit/pipeline/evaluation/test_processor.py

📚 Learning: 2025-09-10T15:48:14.671Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:43-49
Timestamp: 2025-09-10T15:48:14.671Z
Learning: In the lightspeed-evaluation framework, system configuration uses Pydantic data models (SystemConfig, OutputConfig, LoggingConfig, etc.) rather than plain dictionaries. Components like OutputHandler receive properly structured Pydantic models, so direct attribute access (e.g., system_config.output.enabled_outputs) is the correct approach.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py

📚 Learning: 2025-09-19T00:37:23.798Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/processor.py

📚 Learning: 2025-09-19T12:32:06.403Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

src/lightspeed_evaluation/pipeline/evaluation/processor.py
tests/unit/pipeline/evaluation/test_processor.py

🧬 Code graph analysis (2)

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

src/lightspeed_evaluation/core/system/validator.py (1)

DataValidator (79-331)

src/lightspeed_evaluation/pipeline/evaluation/processor.py (1)

src/lightspeed_evaluation/core/models/data.py (2)

is_metric_invalid (95-97)

is_metric_invalid (355-357)

🔇 Additional comments (2)

src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

65-68: LGTM!

The DataValidator initialization correctly passes the fail_on_invalid_data configuration parameter, aligning with the PR objective to add configurable validation behavior.

tests/unit/pipeline/evaluation/test_processor.py (1)

1-394: Excellent test coverage!

The test suite comprehensively covers the new invalid metric handling functionality with well-structured test cases including:

Valid metrics scenarios

Invalid metric detection and skipping

Mixed valid/invalid metrics

Edge cases (empty metrics, evaluator returning None)

Verification of correct request construction and metric ordering

The tests properly use fixtures, mocks, and log capture to validate behavior.

src/lightspeed_evaluation/pipeline/evaluation/processor.py

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 582ba25 and 81eb549.

📒 Files selected for processing (2)

src/lightspeed_evaluation/pipeline/evaluation/processor.py (2 hunks)
tests/unit/pipeline/evaluation/test_processor.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/lightspeed_evaluation/pipeline/evaluation/processor.py

🧰 Additional context used

📓 Path-based instructions (1)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

tests/unit/pipeline/evaluation/test_processor.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

📚 Learning: 2025-09-19T12:32:06.403Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

tests/unit/pipeline/evaluation/test_processor.py

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

tests/unit/pipeline/evaluation/test_processor.py

🧬 Code graph analysis (1)

tests/unit/pipeline/evaluation/test_processor.py (8)

src/lightspeed_evaluation/core/models/data.py (8)

EvaluationData (309-367)

EvaluationRequest (426-479)

EvaluationResult (370-409)

TurnData (35-306)

add_invalid_metric (91-93)

add_invalid_metric (351-353)

is_metric_invalid (95-97)

is_metric_invalid (355-357)

src/lightspeed_evaluation/core/models/system.py (1)

SystemConfig (271-301)

src/lightspeed_evaluation/core/system/loader.py (1)

ConfigLoader (70-125)

src/lightspeed_evaluation/pipeline/evaluation/processor.py (3)

ConversationProcessor (37-299)

ProcessorComponents (27-34)

_evaluate_turn (181-214)

src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)

MetricsEvaluator (25-172)

src/lightspeed_evaluation/pipeline/evaluation/amender.py (1)

APIDataAmender (13-80)

src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)

EvaluationErrorHandler (10-201)

src/lightspeed_evaluation/core/metrics/manager.py (1)

MetricManager (17-166)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: tests (3.11)
GitHub Check: tests (3.13)
GitHub Check: tests (3.12)
GitHub Check: mypy

🔇 Additional comments (11)

tests/unit/pipeline/evaluation/test_processor.py (11)

1-19: LGTM!

Imports are well-organized and appropriate for the test module. Correctly uses pytest as required by the coding guidelines.

21-121: LGTM!

Fixtures are well-structured and follow pytest best practices. Correctly uses mocker fixture from pytest-mock as required by coding guidelines. Good use of fixture composition for assembling the processor with all its dependencies.

126-150: LGTM!

Comprehensive test of the happy path with valid metrics. Correctly verifies result count, types, evaluator invocation count, and metric identifier ordering.

152-184: LGTM!

Well-structured test for invalid metric handling. Correctly verifies that invalid metrics produce ERROR results without invoking the evaluator, while valid metrics are still evaluated. Good use of caplog to verify error logging.

218-254: LGTM!

Excellent test for mixed valid/invalid metrics. Correctly verifies that result order is preserved, invalid metrics produce ERROR results at the correct position, and only valid metrics trigger evaluator calls.

256-273: LGTM!

Good edge case test for empty metrics list. Correctly verifies no results and no evaluator invocation.

275-299: LGTM!

Thorough verification of EvaluationRequest construction. Correctly checks that all key fields (conv_data, metric_identifier, turn_id, turn_idx) are properly populated from the input data.

301-324: LGTM!

Good test for handling evaluator returning None. Correctly reconfigures the mock and verifies graceful handling with empty results while confirming the evaluator is still invoked.

326-347: LGTM!

Good test for turn index handling in multi-turn conversations. Correctly verifies that turn_idx and turn_id are properly set when evaluating a non-first turn.

349-374: LGTM!

Excellent test for metric evaluation order preservation. Correctly verifies that metrics are evaluated in the order provided, which is important for consistent behavior.

376-401: LGTM!

Comprehensive test for TurnData's invalid metric tracking. While this tests the data model directly rather than the processor, it's appropriately placed here since this functionality is central to the processor's behavior. Good coverage of initial state, additions, and idempotent behavior.

tests/unit/pipeline/evaluation/test_processor.py

asamal4

Now changes seem to be aligned.. Thank you..
LGTM

There is a minor doc string issue though (already highlighted by coderabbit).

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/unit/pipeline/evaluation/test_processor.py (1)

41-61: Consider constructing EvaluationResult with all required fields in the mock.

mock_metrics_evaluator currently only sets a subset of EvaluationResult fields. Given the model also exposes query, response, and execution_time, it’s safer (and closer to production behavior) to populate them from the request so tests don’t silently diverge from real data or break if those fields are required.

For example:

-    def evaluate_metric(request):
-        """Mock evaluate_metric that returns a result based on metric."""
-        return EvaluationResult(
-            conversation_group_id=request.conv_data.conversation_group_id,
-            turn_id=request.turn_id,
-            metric_identifier=request.metric_identifier,
-            result="PASS",
-            score=0.85,
-            reason="Test evaluation",
-            threshold=0.7,
-        )
+    def evaluate_metric(request):
+        """Mock evaluate_metric that returns a result based on metric."""
+        return EvaluationResult(
+            conversation_group_id=request.conv_data.conversation_group_id,
+            turn_id=request.turn_id,
+            metric_identifier=request.metric_identifier,
+            result="PASS",
+            score=0.85,
+            reason="Test evaluation",
+            threshold=0.7,
+            query=getattr(getattr(request, "turn_data", None), "query", ""),
+            response=getattr(getattr(request, "turn_data", None), "response", ""),
+            execution_time=0.0,
+        )

Also, good call using the mocker fixture and spec=MetricsEvaluator / ConfigLoader, which aligns with the pytest/pytest-mock testing guidelines. As per coding guidelines, ...

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81eb549 and 52b2dae.

📒 Files selected for processing (1)

tests/unit/pipeline/evaluation/test_processor.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

tests/unit/pipeline/evaluation/test_processor.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

📚 Learning: 2025-09-19T12:32:06.403Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

tests/unit/pipeline/evaluation/test_processor.py

📚 Learning: 2025-09-18T23:59:37.026Z

Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

tests/unit/pipeline/evaluation/test_processor.py

🧬 Code graph analysis (1)

tests/unit/pipeline/evaluation/test_processor.py (1)

src/lightspeed_evaluation/core/models/data.py (8)

EvaluationData (309-367)

EvaluationRequest (426-479)

EvaluationResult (370-409)

TurnData (35-306)

add_invalid_metric (91-93)

add_invalid_metric (351-353)

is_metric_invalid (95-97)

is_metric_invalid (355-357)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: tests (3.12)
GitHub Check: mypy

🔇 Additional comments (5)

tests/unit/pipeline/evaluation/test_processor.py (5)

126-151: Valid-metric path is well covered.

This test nicely asserts both the number of results and that evaluate_metric is invoked twice with correctly ordered metric_identifier values, which pins the expected request sequence for the happy path.

152-216: Invalid and mixed-metric tests accurately encode ERROR semantics and logging.

The three tests around invalid metrics (single invalid, all invalid, and mixed valid/invalid) clearly assert:

ERROR EvaluationResults for invalid metrics,

that valid metrics are still evaluated in-place,

evaluator call counts (0 / 1 / 2 as appropriate), and

presence of the expected log messages via caplog.

The updated docstring for the “all invalid metrics” case now matches the asserted behavior, resolving the earlier mismatch noted in prior review comments.

Also applies to: 218-255

256-324: Edge cases for empty metric lists and None evaluator results are handled correctly.

The tests for an empty turn_metrics list and for evaluate_metric returning None both assert that no results are produced while still verifying evaluator invocation where relevant. This gives good coverage of “no-op” and “dropped result” scenarios without over-specifying internal behavior.

326-374: Turn index and metric evaluation order are well specified.

The tests around multiple turns and metric ordering ensure:

turn_idx and turn_id are correctly propagated into EvaluationRequest, and

metrics are evaluated strictly in the order provided.

These constraints are important for downstream consumers and make future refactors to _evaluate_turn safer.

376-401: TurnData invalid-metric helpers are thoroughly tested.

This test cleanly covers the default state, marking metrics invalid, multiple invalid metrics, and idempotent re-adding of the same metric. It gives solid confidence in the internal _invalid_metrics behavior that the processor relies on.

tisnik

LGTM

VladimirKadlec marked this pull request as ready for review November 7, 2025 14:48

VladimirKadlec force-pushed the missing-context-fail branch from adef500 to 7eb05b7 Compare November 10, 2025 08:14

asamal4 reviewed Nov 10, 2025

View reviewed changes

src/lightspeed_evaluation/pipeline/evaluation/processor.py Outdated Show resolved Hide resolved

asamal4 reviewed Nov 10, 2025

View reviewed changes

src/lightspeed_evaluation/core/system/validator.py Show resolved Hide resolved

VladimirKadlec marked this pull request as draft November 24, 2025 10:25

VladimirKadlec added 5 commits November 24, 2025 11:27

add support for fail_on_invalid_data option

5633f92

docs: add docs, fix default value

523737c

add support for per metric fails

8bb0b64

add tests (claude assisted)

49aecab

fix linting

059c7c9

VladimirKadlec force-pushed the missing-context-fail branch from 3a498c2 to 059c7c9 Compare November 24, 2025 10:36

VladimirKadlec marked this pull request as ready for review November 24, 2025 10:39

VladimirKadlec requested a review from asamal4 November 24, 2025 10:39

coderabbitai bot reviewed Nov 24, 2025

View reviewed changes

src/lightspeed_evaluation/pipeline/evaluation/processor.py Show resolved Hide resolved

src/lightspeed_evaluation/pipeline/evaluation/processor.py Show resolved Hide resolved

fix logging

582ba25

asamal4 reviewed Nov 24, 2025

View reviewed changes

src/lightspeed_evaluation/pipeline/evaluation/processor.py Outdated Show resolved Hide resolved

add failed metrics to results

81eb549

VladimirKadlec requested a review from asamal4 November 25, 2025 11:40

coderabbitai bot reviewed Nov 25, 2025

View reviewed changes

tests/unit/pipeline/evaluation/test_processor.py Show resolved Hide resolved

asamal4 approved these changes Nov 25, 2025

View reviewed changes

doc: fix test docstring

52b2dae

coderabbitai bot reviewed Nov 26, 2025

View reviewed changes

tisnik approved these changes Nov 26, 2025

View reviewed changes

tisnik merged commit c7865cc into lightspeed-core:main Nov 26, 2025
15 checks passed

VladimirKadlec deleted the missing-context-fail branch November 26, 2025 09:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for fail_on_invalid_data option#94

add support for fail_on_invalid_data option#94
tisnik merged 8 commits intolightspeed-core:mainfrom
VladimirKadlec:missing-context-fail

VladimirKadlec commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

lpiwowar commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

asamal4 left a comment •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

tisnik left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

VladimirKadlec commented Nov 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Uh oh!

lpiwowar commented Nov 11, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

asamal4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

tisnik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

VladimirKadlec commented Nov 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 7, 2025 •

edited

Loading

asamal4 left a comment •

edited

Loading