Skip to content

add support for fail_on_invalid_data option#94

Merged
tisnik merged 8 commits intolightspeed-core:mainfrom
VladimirKadlec:missing-context-fail
Nov 26, 2025
Merged

add support for fail_on_invalid_data option#94
tisnik merged 8 commits intolightspeed-core:mainfrom
VladimirKadlec:missing-context-fail

Conversation

@VladimirKadlec
Copy link
Contributor

@VladimirKadlec VladimirKadlec commented Nov 7, 2025

LCORE-647

Summary by CodeRabbit

  • New Features

    • New configuration flag core.fail_on_invalid_data (default: true).
    • Per-turn expected-keywords validation and per-turn / per-conversation invalid-metric tracking.
  • Bug Fixes

    • Metrics evaluation now yields explicit ERROR results and skips evaluation for invalid metrics, with logging.
  • Tests

    • Added comprehensive unit tests for turn/conversation evaluation, invalid-metric handling, ordering, and request construction.
  • Documentation

    • README updated to document the new config flag.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

Walkthrough

Adds config flag core.fail_on_invalid_data; augments data models with per-turn and per-conversation invalid-metric tracking and an expected_keywords validator; updates the DataValidator to respect the new flag and record invalid metrics; evaluation pipeline and processor skip metrics marked invalid; new unit tests for turn evaluation behavior.

Changes

Cohort / File(s) Summary
Configuration & System Config
config/system.yaml, src/lightspeed_evaluation/core/models/system.py, README.md
Added boolean core.fail_on_invalid_data (default: true) to config and CoreConfig; README updated.
Data Model Enhancements
src/lightspeed_evaluation/core/models/data.py
Added TurnData.expected_keywords: Optional[list[list[str]]] with validator; added _invalid_metrics: set[str] plus add_invalid_metric(metric: str) and is_metric_invalid(metric: str) on TurnData and EvaluationData.
Validation Logic
src/lightspeed_evaluation/core/system/validator.py
DataValidator.__init__ gains fail_on_invalid_data: bool; validator records unknown/invalid metrics on turns/conversations and only returns failure when flag is true, otherwise logs and continues.
Pipeline & Runner Integration
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py, src/lightspeed_evaluation/runner/evaluation.py
Pass config.core.fail_on_invalid_data into DataValidator during initialization.
Evaluation Processor
src/lightspeed_evaluation/pipeline/evaluation/processor.py
Early checks in _evaluate_turn and _evaluate_conversation produce ERROR results and skip evaluation for metrics flagged invalid by is_metric_invalid().
Tests
tests/unit/pipeline/evaluation/test_processor.py
New unit tests for ConversationProcessor._evaluate_turn covering metric ordering, valid/invalid/mixed metrics, EvaluationRequest construction, evaluator None behavior, turn index handling, and invalid-metric state behavior.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Config as Config
    participant Validator as DataValidator
    participant Data as Turn/Eval Data
    participant Processor as ConversationProcessor
    participant Evaluator as MetricsEvaluator

    Config->>Validator: __init__(api_enabled, fail_on_invalid_data)
    Validator->>Validator: validate(data)

    alt fail_on_invalid_data = true
        Validator-->>Config: return False on validation error
        Note over Validator: validation fails (hard) color:`#f8d7da`
    else fail_on_invalid_data = false
        Validator->>Data: add_invalid_metric(metric)
        Validator->>Validator: log notice, continue
        Note over Validator: metric flagged but validation continues color:`#fff3cd`
    end

    Processor->>Data: is_metric_invalid(metric)?
    alt metric invalid
        Processor->>Processor: log error, emit ERROR result, skip metric color:`#f8d7da`
    else metric valid
        Processor->>Evaluator: evaluate(EvaluationRequest)
        Evaluator-->>Processor: result / None
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Review focus:
    • DataValidator branching for fail_on_invalid_data (logging vs. hard-fail) and related test coverage.
    • Correct initialization and mutation semantics of _invalid_metrics on TurnData and EvaluationData (thread-safety / shared-set pitfalls).
    • Processor changes to ensure skipped metrics produce ERROR results and preserve metric ordering.
    • New unit tests correctness and mock/setup hygiene.

Possibly related PRs

Suggested reviewers

  • asamal4
  • tisnik
  • lpiwowar

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'add support for fail_on_invalid_data option' directly and accurately describes the main change: adding a new configuration option to control validation failure behavior.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@VladimirKadlec VladimirKadlec marked this pull request as ready for review November 7, 2025 14:48
@lpiwowar
Copy link
Contributor

Thanks @VladimirKadlec for working on this 👍

I agree with comments from Asutosh.

@VladimirKadlec VladimirKadlec marked this pull request as draft November 24, 2025 10:25
@VladimirKadlec VladimirKadlec marked this pull request as ready for review November 24, 2025 10:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/lightspeed_evaluation/core/models/system.py (1)

265-268: Improve field description for clarity.

The description has a minor grammatical issue (missing comma) and could be more explicit about the behavior in both states.

Consider this improved description:

     fail_on_invalid_data: bool = Field(
         default=True,
-        description="If False don't fail on invalid conversations",
+        description="When True, raise errors on validation failures; when False, log warnings and continue processing",
     )
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7eb05b7 and 059c7c9.

📒 Files selected for processing (9)
  • README.md (1 hunks)
  • config/system.yaml (2 hunks)
  • src/lightspeed_evaluation/core/models/data.py (2 hunks)
  • src/lightspeed_evaluation/core/models/system.py (1 hunks)
  • src/lightspeed_evaluation/core/system/validator.py (6 hunks)
  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1 hunks)
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py (2 hunks)
  • src/lightspeed_evaluation/runner/evaluation.py (1 hunks)
  • tests/unit/pipeline/evaluation/test_processor.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • README.md
  • config/system.yaml
  • src/lightspeed_evaluation/runner/evaluation.py
  • src/lightspeed_evaluation/core/system/validator.py
  • src/lightspeed_evaluation/core/models/data.py
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.
📚 Learning: 2025-09-18T23:59:37.026Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
  • src/lightspeed_evaluation/core/models/system.py
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
  • tests/unit/pipeline/evaluation/test_processor.py
📚 Learning: 2025-09-10T15:48:14.671Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:43-49
Timestamp: 2025-09-10T15:48:14.671Z
Learning: In the lightspeed-evaluation framework, system configuration uses Pydantic data models (SystemConfig, OutputConfig, LoggingConfig, etc.) rather than plain dictionaries. Components like OutputHandler receive properly structured Pydantic models, so direct attribute access (e.g., system_config.output.enabled_outputs) is the correct approach.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/pipeline.py
📚 Learning: 2025-09-19T00:37:23.798Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
📚 Learning: 2025-09-19T12:32:06.403Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
  • tests/unit/pipeline/evaluation/test_processor.py
🧬 Code graph analysis (2)
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)
src/lightspeed_evaluation/core/system/validator.py (1)
  • DataValidator (79-331)
src/lightspeed_evaluation/pipeline/evaluation/processor.py (1)
src/lightspeed_evaluation/core/models/data.py (2)
  • is_metric_invalid (95-97)
  • is_metric_invalid (355-357)
🔇 Additional comments (2)
src/lightspeed_evaluation/pipeline/evaluation/pipeline.py (1)

65-68: LGTM!

The DataValidator initialization correctly passes the fail_on_invalid_data configuration parameter, aligning with the PR objective to add configurable validation behavior.

tests/unit/pipeline/evaluation/test_processor.py (1)

1-394: Excellent test coverage!

The test suite comprehensively covers the new invalid metric handling functionality with well-structured test cases including:

  • Valid metrics scenarios
  • Invalid metric detection and skipping
  • Mixed valid/invalid metrics
  • Edge cases (empty metrics, evaluator returning None)
  • Verification of correct request construction and metric ordering

The tests properly use fixtures, mocks, and log capture to validate behavior.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 582ba25 and 81eb549.

📒 Files selected for processing (2)
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py (2 hunks)
  • tests/unit/pipeline/evaluation/test_processor.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
🧰 Additional context used
📓 Path-based instructions (1)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

  • tests/unit/pipeline/evaluation/test_processor.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.
📚 Learning: 2025-09-19T12:32:06.403Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

  • tests/unit/pipeline/evaluation/test_processor.py
📚 Learning: 2025-09-18T23:59:37.026Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

  • tests/unit/pipeline/evaluation/test_processor.py
🧬 Code graph analysis (1)
tests/unit/pipeline/evaluation/test_processor.py (8)
src/lightspeed_evaluation/core/models/data.py (8)
  • EvaluationData (309-367)
  • EvaluationRequest (426-479)
  • EvaluationResult (370-409)
  • TurnData (35-306)
  • add_invalid_metric (91-93)
  • add_invalid_metric (351-353)
  • is_metric_invalid (95-97)
  • is_metric_invalid (355-357)
src/lightspeed_evaluation/core/models/system.py (1)
  • SystemConfig (271-301)
src/lightspeed_evaluation/core/system/loader.py (1)
  • ConfigLoader (70-125)
src/lightspeed_evaluation/pipeline/evaluation/processor.py (3)
  • ConversationProcessor (37-299)
  • ProcessorComponents (27-34)
  • _evaluate_turn (181-214)
src/lightspeed_evaluation/pipeline/evaluation/evaluator.py (1)
  • MetricsEvaluator (25-172)
src/lightspeed_evaluation/pipeline/evaluation/amender.py (1)
  • APIDataAmender (13-80)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)
  • EvaluationErrorHandler (10-201)
src/lightspeed_evaluation/core/metrics/manager.py (1)
  • MetricManager (17-166)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (3.11)
  • GitHub Check: tests (3.13)
  • GitHub Check: tests (3.12)
  • GitHub Check: mypy
🔇 Additional comments (11)
tests/unit/pipeline/evaluation/test_processor.py (11)

1-19: LGTM!

Imports are well-organized and appropriate for the test module. Correctly uses pytest as required by the coding guidelines.


21-121: LGTM!

Fixtures are well-structured and follow pytest best practices. Correctly uses mocker fixture from pytest-mock as required by coding guidelines. Good use of fixture composition for assembling the processor with all its dependencies.


126-150: LGTM!

Comprehensive test of the happy path with valid metrics. Correctly verifies result count, types, evaluator invocation count, and metric identifier ordering.


152-184: LGTM!

Well-structured test for invalid metric handling. Correctly verifies that invalid metrics produce ERROR results without invoking the evaluator, while valid metrics are still evaluated. Good use of caplog to verify error logging.


218-254: LGTM!

Excellent test for mixed valid/invalid metrics. Correctly verifies that result order is preserved, invalid metrics produce ERROR results at the correct position, and only valid metrics trigger evaluator calls.


256-273: LGTM!

Good edge case test for empty metrics list. Correctly verifies no results and no evaluator invocation.


275-299: LGTM!

Thorough verification of EvaluationRequest construction. Correctly checks that all key fields (conv_data, metric_identifier, turn_id, turn_idx) are properly populated from the input data.


301-324: LGTM!

Good test for handling evaluator returning None. Correctly reconfigures the mock and verifies graceful handling with empty results while confirming the evaluator is still invoked.


326-347: LGTM!

Good test for turn index handling in multi-turn conversations. Correctly verifies that turn_idx and turn_id are properly set when evaluating a non-first turn.


349-374: LGTM!

Excellent test for metric evaluation order preservation. Correctly verifies that metrics are evaluated in the order provided, which is important for consistent behavior.


376-401: LGTM!

Comprehensive test for TurnData's invalid metric tracking. While this tests the data model directly rather than the processor, it's appropriately placed here since this functionality is central to the processor's behavior. Good coverage of initial state, additions, and idempotent behavior.

Copy link
Collaborator

@asamal4 asamal4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now changes seem to be aligned.. Thank you..
LGTM

There is a minor doc string issue though (already highlighted by coderabbit).

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
tests/unit/pipeline/evaluation/test_processor.py (1)

41-61: Consider constructing EvaluationResult with all required fields in the mock.

mock_metrics_evaluator currently only sets a subset of EvaluationResult fields. Given the model also exposes query, response, and execution_time, it’s safer (and closer to production behavior) to populate them from the request so tests don’t silently diverge from real data or break if those fields are required.

For example:

-    def evaluate_metric(request):
-        """Mock evaluate_metric that returns a result based on metric."""
-        return EvaluationResult(
-            conversation_group_id=request.conv_data.conversation_group_id,
-            turn_id=request.turn_id,
-            metric_identifier=request.metric_identifier,
-            result="PASS",
-            score=0.85,
-            reason="Test evaluation",
-            threshold=0.7,
-        )
+    def evaluate_metric(request):
+        """Mock evaluate_metric that returns a result based on metric."""
+        return EvaluationResult(
+            conversation_group_id=request.conv_data.conversation_group_id,
+            turn_id=request.turn_id,
+            metric_identifier=request.metric_identifier,
+            result="PASS",
+            score=0.85,
+            reason="Test evaluation",
+            threshold=0.7,
+            query=getattr(getattr(request, "turn_data", None), "query", ""),
+            response=getattr(getattr(request, "turn_data", None), "response", ""),
+            execution_time=0.0,
+        )

Also, good call using the mocker fixture and spec=MetricsEvaluator / ConfigLoader, which aligns with the pytest/pytest-mock testing guidelines. As per coding guidelines, ...

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 81eb549 and 52b2dae.

📒 Files selected for processing (1)
  • tests/unit/pipeline/evaluation/test_processor.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

tests/**/*.py: Use pytest with pytest-mock (mocker fixture), not unittest.mock, for all mocking
Test files should use naming convention test_*.py for files, test_* for functions, and Test* for classes

Files:

  • tests/unit/pipeline/evaluation/test_processor.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.
📚 Learning: 2025-09-19T12:32:06.403Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

  • tests/unit/pipeline/evaluation/test_processor.py
📚 Learning: 2025-09-18T23:59:37.026Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

  • tests/unit/pipeline/evaluation/test_processor.py
🧬 Code graph analysis (1)
tests/unit/pipeline/evaluation/test_processor.py (1)
src/lightspeed_evaluation/core/models/data.py (8)
  • EvaluationData (309-367)
  • EvaluationRequest (426-479)
  • EvaluationResult (370-409)
  • TurnData (35-306)
  • add_invalid_metric (91-93)
  • add_invalid_metric (351-353)
  • is_metric_invalid (95-97)
  • is_metric_invalid (355-357)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: tests (3.12)
  • GitHub Check: mypy
🔇 Additional comments (5)
tests/unit/pipeline/evaluation/test_processor.py (5)

126-151: Valid-metric path is well covered.

This test nicely asserts both the number of results and that evaluate_metric is invoked twice with correctly ordered metric_identifier values, which pins the expected request sequence for the happy path.


152-216: Invalid and mixed-metric tests accurately encode ERROR semantics and logging.

The three tests around invalid metrics (single invalid, all invalid, and mixed valid/invalid) clearly assert:

  • ERROR EvaluationResults for invalid metrics,
  • that valid metrics are still evaluated in-place,
  • evaluator call counts (0 / 1 / 2 as appropriate), and
  • presence of the expected log messages via caplog.

The updated docstring for the “all invalid metrics” case now matches the asserted behavior, resolving the earlier mismatch noted in prior review comments.

Also applies to: 218-255


256-324: Edge cases for empty metric lists and None evaluator results are handled correctly.

The tests for an empty turn_metrics list and for evaluate_metric returning None both assert that no results are produced while still verifying evaluator invocation where relevant. This gives good coverage of “no-op” and “dropped result” scenarios without over-specifying internal behavior.


326-374: Turn index and metric evaluation order are well specified.

The tests around multiple turns and metric ordering ensure:

  • turn_idx and turn_id are correctly propagated into EvaluationRequest, and
  • metrics are evaluated strictly in the order provided.

These constraints are important for downstream consumers and make future refactors to _evaluate_turn safer.


376-401: TurnData invalid-metric helpers are thoroughly tested.

This test cleanly covers the default state, marking metrics invalid, multiple invalid metrics, and idempotent re-adding of the same metric. It gives solid confidence in the internal _invalid_metrics behavior that the processor relies on.

Copy link
Contributor

@tisnik tisnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tisnik tisnik merged commit c7865cc into lightspeed-core:main Nov 26, 2025
15 checks passed
@VladimirKadlec VladimirKadlec deleted the missing-context-fail branch November 26, 2025 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants