Skip to content

chore: refactor processor & errors.py#119

Merged
asamal4 merged 1 commit intolightspeed-core:mainfrom
asamal4:refactor-proc
Dec 16, 2025
Merged

chore: refactor processor & errors.py#119
asamal4 merged 1 commit intolightspeed-core:mainfrom
asamal4:refactor-proc

Conversation

@asamal4
Copy link
Collaborator

@asamal4 asamal4 commented Dec 15, 2025

Description

Refactor to eliminate too many local in processor.py and reduce code in error.py

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Unit tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

  • Related Issue #
  • Closes #

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

Release Notes

  • Refactor
    • Improved error handling and setup/cleanup script management in the evaluation pipeline for more consistent behavior across different failure scenarios.
    • Standardized error result construction across the metric evaluation system for better consistency and maintainability.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 15, 2025

Walkthrough

This PR introduces a new create_error_result helper method in EvaluationErrorHandler and refactors error-marking logic to centralize ERROR result construction. Additionally, it restructures ConversationProcessor.process_conversation with a new TurnProcessingContext dataclass and adds helper methods for setup/cleanup script execution, per-turn API amendment, error cascade handling, and turn/conversation-level evaluation orchestration.

Changes

Cohort / File(s) Summary
Error handling centralization
src/lightspeed_evaluation/pipeline/evaluation/errors.py
Adds create_error_result helper method to standardize ERROR result construction; refactors mark_all_metrics_as_error, mark_turn_metrics_as_error, and mark_cascade_failure to use the new helper; updates variable names from metric_identifier to metric_id; adds Optional typing import.
Conversation processor restructuring
src/lightspeed_evaluation/pipeline/evaluation/processor.py
Introduces TurnProcessingContext dataclass to encapsulate per-turn and per-conversation metric state; reworks process_conversation to build context early, run setup script with failure handling, delegate to new orchestration methods, and ensure cleanup runs; adds helper methods _build_processing_context, _has_metrics_to_evaluate, _handle_setup_failure, _process_turns_and_conversation, _process_turn_api, _handle_api_error, _evaluate_turn, _evaluate_conversation, _run_setup_script, _run_cleanup_script; integrates error_handler.create_error_result for result construction.
Test mock updates
tests/unit/pipeline/evaluation/test_processor.py
Updates mock_error_handler fixture to include create_error_result side effect that constructs ERROR EvaluationResult entries with conversation ID, turn ID, metric ID, reason, and query fields.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant ConversationProcessor
    participant ErrorHandler
    participant SetupScript
    participant TurnAPI
    participant TurnEvaluator
    participant ConvEvaluator
    participant CleanupScript

    Client->>ConversationProcessor: process_conversation(conv_data)
    ConversationProcessor->>ConversationProcessor: _build_processing_context()
    ConversationProcessor->>ConversationProcessor: _has_metrics_to_evaluate()
    
    alt No metrics to evaluate
        ConversationProcessor-->>Client: return []
    else Metrics exist
        ConversationProcessor->>SetupScript: _run_setup_script()
        
        alt Setup fails
            SetupScript-->>ConversationProcessor: error
            ConversationProcessor->>ErrorHandler: mark all metrics as ERROR
            ErrorHandler-->>ConversationProcessor: error results
            ConversationProcessor->>CleanupScript: _run_cleanup_script()
            CleanupScript-->>ConversationProcessor: done
            ConversationProcessor-->>Client: return error results
        else Setup succeeds
            SetupScript-->>ConversationProcessor: success
            
            ConversationProcessor->>ConversationProcessor: _process_turns_and_conversation()
            
            loop For each turn
                ConversationProcessor->>TurnAPI: _process_turn_api()
                
                alt API call fails
                    TurnAPI-->>ConversationProcessor: error message
                    ConversationProcessor->>ErrorHandler: _handle_api_error(turn)
                    ErrorHandler-->>ConversationProcessor: mark turn + cascade to remaining turns/conv metrics as ERROR
                    ConversationProcessor->>ConversationProcessor: break early from turns loop
                else API call succeeds
                    TurnAPI-->>ConversationProcessor: success
                    ConversationProcessor->>TurnEvaluator: _evaluate_turn()
                    TurnEvaluator-->>ConversationProcessor: turn evaluation results
                end
            end
            
            ConversationProcessor->>ConvEvaluator: _evaluate_conversation()
            ConvEvaluator-->>ConversationProcessor: conversation evaluation results
            
            ConversationProcessor->>CleanupScript: _run_cleanup_script()
            CleanupScript-->>ConversationProcessor: done
            
            ConversationProcessor-->>Client: return all results
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–60 minutes

  • Error cascade logic (_handle_api_error method and early break in turn processing loop): verify correct propagation of errors to remaining turns and conversation-level metrics
  • Setup/cleanup script integration: ensure setup failures correctly trigger metric marking and cleanup execution; validate that cleanup always runs regardless of intermediate failures
  • Context building (_build_processing_context): confirm accurate resolution of per-turn and per-conversation metrics
  • Refactoring of existing methods: validate that new helper method usage (create_error_result) produces identical results to prior inlined construction across mark_all_metrics_as_error, mark_turn_metrics_as_error, and mark_cascade_failure
  • Test mock coverage: ensure mock create_error_result side effect accurately reflects actual implementation behavior for all call sites

Possibly related PRs

Suggested reviewers

  • tisnik
  • VladimirKadlec

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'chore: refactor processor & errors.py' accurately reflects the main changes—refactoring in processor.py and errors.py to eliminate local variables and reduce code duplication through a new helper method.
Docstring Coverage ✅ Passed Docstring coverage is 94.12% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)

18-43: Consider adding Returns section to docstring.

The method has proper type hints and follows Google-style docstring format for Args, but the Returns section could be more explicit about what fields are populated in the returned EvaluationResult.

Consider this enhancement:

     def create_error_result(  # pylint: disable=too-many-arguments
         self,
         conv_id: str,
         metric_id: str,
         reason: str,
         *,
         turn_id: Optional[str] = None,
         query: str = "",
     ) -> EvaluationResult:
         """Create an ERROR EvaluationResult with common defaults.
 
         Args:
             conv_id: Conversation group ID
             metric_id: Metric identifier
             reason: Error reason
             turn_id: Turn ID (None for conversation-level)
             query: Query text
+
+        Returns:
+            EvaluationResult with result="ERROR" and other fields set to defaults
         """
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aacebdf and d8d4dec.

📒 Files selected for processing (3)
  • src/lightspeed_evaluation/pipeline/evaluation/errors.py (6 hunks)
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py (5 hunks)
  • tests/unit/pipeline/evaluation/test_processor.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/test_*.py

📄 CodeRabbit inference engine (AGENTS.md)

Use pytest mocking (mocker fixture) instead of unittest.mock

Files:

  • tests/unit/pipeline/evaluation/test_processor.py
tests/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

Mirror test structure to source code organization, with test files named test_.py and test functions/classes named test_ and Test*

Files:

  • tests/unit/pipeline/evaluation/test_processor.py
src/lightspeed_evaluation/**/*.py

📄 CodeRabbit inference engine (AGENTS.md)

src/lightspeed_evaluation/**/*.py: Provide type hints for all public functions and methods
Use Google-style docstrings for all public APIs
Use custom exceptions from core.system.exceptions for error handling
Use structured logging with appropriate levels in lightspeed_evaluation modules

Files:

  • src/lightspeed_evaluation/pipeline/evaluation/errors.py
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-09-19T00:37:23.798Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/errors.py
  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
📚 Learning: 2025-12-11T10:05:06.422Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T10:05:06.422Z
Learning: Applies to src/lightspeed_evaluation/**/*.py : Use custom exceptions from core.system.exceptions for error handling

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-12-11T10:05:06.422Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T10:05:06.422Z
Learning: Applies to src/lightspeed_evaluation/**/*.py : Use structured logging with appropriate levels in lightspeed_evaluation modules

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-07-16T13:21:53.225Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:190-214
Timestamp: 2025-07-16T13:21:53.225Z
Learning: In the lsc_agent_eval framework, there's a distinction between FAIL and ERROR results:
- FAIL: evaluation failed (e.g., script verification failed, agent response doesn't match expected criteria) - result.error is None
- ERROR: error running eval (e.g., setup script failed, agent API error) - result.error contains error message

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-07-16T13:20:40.632Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:0-0
Timestamp: 2025-07-16T13:20:40.632Z
Learning: In the lsc_agent_eval package, evaluation results use "FAIL" for evaluations that ran but didn't pass the criteria, and "ERROR" for errors in the evaluation process itself (like setup script failures, API errors, etc.).

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-09-19T12:32:06.403Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
📚 Learning: 2025-09-18T23:59:37.026Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.

Applied to files:

  • src/lightspeed_evaluation/pipeline/evaluation/processor.py
🧬 Code graph analysis (3)
tests/unit/pipeline/evaluation/test_processor.py (2)
src/lightspeed_evaluation/core/models/data.py (1)
  • EvaluationResult (378-446)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)
  • create_error_result (18-43)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)
src/lightspeed_evaluation/core/models/data.py (2)
  • EvaluationResult (378-446)
  • EvaluationData (317-375)
src/lightspeed_evaluation/pipeline/evaluation/processor.py (4)
src/lightspeed_evaluation/core/models/data.py (3)
  • EvaluationData (317-375)
  • EvaluationResult (378-446)
  • TurnData (35-314)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (4)
  • mark_all_metrics_as_error (45-92)
  • mark_turn_metrics_as_error (94-131)
  • mark_cascade_failure (133-184)
  • create_error_result (18-43)
src/lightspeed_evaluation/pipeline/evaluation/amender.py (1)
  • amend_single_turn (20-78)
src/lightspeed_evaluation/core/api/client.py (1)
  • query (72-106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: tests (3.13)
  • GitHub Check: tests (3.11)
  • GitHub Check: tests (3.12)
  • GitHub Check: mypy
🔇 Additional comments (14)
tests/unit/pipeline/evaluation/test_processor.py (1)

535-550: LGTM! Test fixture properly mocks the new error result factory.

The create_error_result_side_effect function correctly mirrors the signature and behavior of the new create_error_result method in EvaluationErrorHandler, enabling proper testing of error result construction.

src/lightspeed_evaluation/pipeline/evaluation/errors.py (3)

68-89: LGTM! Effective refactoring using the centralized error factory.

The refactoring successfully eliminates code duplication by delegating ERROR result construction to create_error_result, improving maintainability and consistency.


120-129: LGTM! Concise list comprehension using the error factory.

The refactoring uses a clean list comprehension with create_error_result, reducing code duplication while maintaining clarity.


159-181: LGTM! Consistent refactoring pattern applied to cascade failure.

The refactoring consistently applies the create_error_result factory pattern to cascade failure scenarios, maintaining the same benefits of reduced duplication and improved maintainability.

src/lightspeed_evaluation/pipeline/evaluation/processor.py (10)

37-44: LGTM! Dataclass effectively reduces parameter passing.

The TurnProcessingContext dataclass successfully encapsulates per-conversation processing state, addressing the PR objective of reducing local variables. The mutable conversation_id field appropriately tracks API state across turns.


56-88: LGTM! Well-structured refactoring improves readability.

The refactored process_conversation method effectively addresses the PR objective by extracting helper methods and reducing local variables. The try-finally block ensures cleanup runs regardless of outcomes.


89-106: LGTM! Clean extraction of context building logic.

The method effectively encapsulates metric resolution logic, contributing to the PR objective of reducing local variables in the main processing method.


108-112: LGTM! Clear helper method for metrics check.

The method provides a readable abstraction for checking whether any metrics need evaluation.


114-127: LGTM! Proper setup failure handling with cleanup.

The method correctly marks all metrics as ERROR and ensures cleanup runs even after setup failure, maintaining proper resource cleanup guarantees.


129-167: LGTM! Well-orchestrated turn and conversation processing.

The method provides clear orchestration of the processing flow, with appropriate early returns for API errors and proper cascade failure handling.


169-182: LGTM! Clean extraction of API processing logic.

The method appropriately encapsulates API call handling and maintains conversation state through the context object.


184-221: LGTM! Proper API error handling with cascade.

The method correctly implements cascade failure logic, marking the failed turn and all subsequent turns and conversation metrics as ERROR with appropriate error messages.


237-244: LGTM! Consistent use of centralized error factory.

The refactoring correctly uses error_handler.create_error_result for invalid turn metrics, aligning with the centralized error result construction pattern.


268-271: LGTM! Consistent use of centralized error factory.

The refactoring correctly uses error_handler.create_error_result for invalid conversation metrics, maintaining consistency with the error handling pattern.

Copy link
Contributor

@VladimirKadlec VladimirKadlec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@asamal4 asamal4 merged commit 5fc3f0c into lightspeed-core:main Dec 16, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants