chore: refactor processor & errors.py#119
Conversation
WalkthroughThis PR introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ConversationProcessor
participant ErrorHandler
participant SetupScript
participant TurnAPI
participant TurnEvaluator
participant ConvEvaluator
participant CleanupScript
Client->>ConversationProcessor: process_conversation(conv_data)
ConversationProcessor->>ConversationProcessor: _build_processing_context()
ConversationProcessor->>ConversationProcessor: _has_metrics_to_evaluate()
alt No metrics to evaluate
ConversationProcessor-->>Client: return []
else Metrics exist
ConversationProcessor->>SetupScript: _run_setup_script()
alt Setup fails
SetupScript-->>ConversationProcessor: error
ConversationProcessor->>ErrorHandler: mark all metrics as ERROR
ErrorHandler-->>ConversationProcessor: error results
ConversationProcessor->>CleanupScript: _run_cleanup_script()
CleanupScript-->>ConversationProcessor: done
ConversationProcessor-->>Client: return error results
else Setup succeeds
SetupScript-->>ConversationProcessor: success
ConversationProcessor->>ConversationProcessor: _process_turns_and_conversation()
loop For each turn
ConversationProcessor->>TurnAPI: _process_turn_api()
alt API call fails
TurnAPI-->>ConversationProcessor: error message
ConversationProcessor->>ErrorHandler: _handle_api_error(turn)
ErrorHandler-->>ConversationProcessor: mark turn + cascade to remaining turns/conv metrics as ERROR
ConversationProcessor->>ConversationProcessor: break early from turns loop
else API call succeeds
TurnAPI-->>ConversationProcessor: success
ConversationProcessor->>TurnEvaluator: _evaluate_turn()
TurnEvaluator-->>ConversationProcessor: turn evaluation results
end
end
ConversationProcessor->>ConvEvaluator: _evaluate_conversation()
ConvEvaluator-->>ConversationProcessor: conversation evaluation results
ConversationProcessor->>CleanupScript: _run_cleanup_script()
CleanupScript-->>ConversationProcessor: done
ConversationProcessor-->>Client: return all results
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–60 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)
18-43: Consider adding Returns section to docstring.The method has proper type hints and follows Google-style docstring format for Args, but the Returns section could be more explicit about what fields are populated in the returned
EvaluationResult.Consider this enhancement:
def create_error_result( # pylint: disable=too-many-arguments self, conv_id: str, metric_id: str, reason: str, *, turn_id: Optional[str] = None, query: str = "", ) -> EvaluationResult: """Create an ERROR EvaluationResult with common defaults. Args: conv_id: Conversation group ID metric_id: Metric identifier reason: Error reason turn_id: Turn ID (None for conversation-level) query: Query text + + Returns: + EvaluationResult with result="ERROR" and other fields set to defaults """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
src/lightspeed_evaluation/pipeline/evaluation/errors.py(6 hunks)src/lightspeed_evaluation/pipeline/evaluation/processor.py(5 hunks)tests/unit/pipeline/evaluation/test_processor.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/test_*.py
📄 CodeRabbit inference engine (AGENTS.md)
Use pytest mocking (mocker fixture) instead of unittest.mock
Files:
tests/unit/pipeline/evaluation/test_processor.py
tests/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
Mirror test structure to source code organization, with test files named test_.py and test functions/classes named test_ and Test*
Files:
tests/unit/pipeline/evaluation/test_processor.py
src/lightspeed_evaluation/**/*.py
📄 CodeRabbit inference engine (AGENTS.md)
src/lightspeed_evaluation/**/*.py: Provide type hints for all public functions and methods
Use Google-style docstrings for all public APIs
Use custom exceptions from core.system.exceptions for error handling
Use structured logging with appropriate levels in lightspeed_evaluation modules
Files:
src/lightspeed_evaluation/pipeline/evaluation/errors.pysrc/lightspeed_evaluation/pipeline/evaluation/processor.py
🧠 Learnings (8)
📓 Common learnings
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 47
File: src/lightspeed_evaluation/core/output/generator.py:140-145
Timestamp: 2025-09-11T12:47:06.747Z
Learning: User asamal4 prefers that non-critical comments are sent when actual code changes are pushed, not on unrelated commits.
📚 Learning: 2025-09-19T00:37:23.798Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:33-36
Timestamp: 2025-09-19T00:37:23.798Z
Learning: In the lightspeed-evaluation codebase, metric resolution (including applying defaults when turn_metrics is None) happens upstream in ConversationProcessor.process_conversation() using MetricManager.resolve_metrics(), not in the EvaluationErrorHandler. The error handler only marks explicitly defined metrics as ERROR.
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/errors.pysrc/lightspeed_evaluation/pipeline/evaluation/processor.py
📚 Learning: 2025-12-11T10:05:06.422Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T10:05:06.422Z
Learning: Applies to src/lightspeed_evaluation/**/*.py : Use custom exceptions from core.system.exceptions for error handling
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-12-11T10:05:06.422Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-evaluation PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-11T10:05:06.422Z
Learning: Applies to src/lightspeed_evaluation/**/*.py : Use structured logging with appropriate levels in lightspeed_evaluation modules
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-07-16T13:21:53.225Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:190-214
Timestamp: 2025-07-16T13:21:53.225Z
Learning: In the lsc_agent_eval framework, there's a distinction between FAIL and ERROR results:
- FAIL: evaluation failed (e.g., script verification failed, agent response doesn't match expected criteria) - result.error is None
- ERROR: error running eval (e.g., setup script failed, agent API error) - result.error contains error message
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-07-16T13:20:40.632Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 19
File: lsc_agent_eval/tests/core/agent_goal_eval/test_evaluator.py:0-0
Timestamp: 2025-07-16T13:20:40.632Z
Learning: In the lsc_agent_eval package, evaluation results use "FAIL" for evaluations that ran but didn't pass the criteria, and "ERROR" for errors in the evaluation process itself (like setup script failures, API errors, etc.).
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/errors.py
📚 Learning: 2025-09-19T12:32:06.403Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/pipeline/evaluation/errors.py:18-31
Timestamp: 2025-09-19T12:32:06.403Z
Learning: When analyzing method calls, always examine the complete call site including all parameters before suggesting fixes. In the lightspeed-evaluation codebase, mark_all_metrics_as_error in processor.py correctly passes both resolved_turn_metrics and resolved_conversation_metrics parameters.
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/processor.py
📚 Learning: 2025-09-18T23:59:37.026Z
Learnt from: asamal4
Repo: lightspeed-core/lightspeed-evaluation PR: 55
File: src/lightspeed_evaluation/core/system/validator.py:146-155
Timestamp: 2025-09-18T23:59:37.026Z
Learning: In the lightspeed-evaluation project, the DataValidator in `src/lightspeed_evaluation/core/system/validator.py` is intentionally designed to validate only explicitly provided user evaluation data, not resolved metrics that include system defaults. When turn_metrics is None, the system falls back to system config defaults, and this validation separation is by design.
Applied to files:
src/lightspeed_evaluation/pipeline/evaluation/processor.py
🧬 Code graph analysis (3)
tests/unit/pipeline/evaluation/test_processor.py (2)
src/lightspeed_evaluation/core/models/data.py (1)
EvaluationResult(378-446)src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)
create_error_result(18-43)
src/lightspeed_evaluation/pipeline/evaluation/errors.py (1)
src/lightspeed_evaluation/core/models/data.py (2)
EvaluationResult(378-446)EvaluationData(317-375)
src/lightspeed_evaluation/pipeline/evaluation/processor.py (4)
src/lightspeed_evaluation/core/models/data.py (3)
EvaluationData(317-375)EvaluationResult(378-446)TurnData(35-314)src/lightspeed_evaluation/pipeline/evaluation/errors.py (4)
mark_all_metrics_as_error(45-92)mark_turn_metrics_as_error(94-131)mark_cascade_failure(133-184)create_error_result(18-43)src/lightspeed_evaluation/pipeline/evaluation/amender.py (1)
amend_single_turn(20-78)src/lightspeed_evaluation/core/api/client.py (1)
query(72-106)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: tests (3.13)
- GitHub Check: tests (3.11)
- GitHub Check: tests (3.12)
- GitHub Check: mypy
🔇 Additional comments (14)
tests/unit/pipeline/evaluation/test_processor.py (1)
535-550: LGTM! Test fixture properly mocks the new error result factory.The
create_error_result_side_effectfunction correctly mirrors the signature and behavior of the newcreate_error_resultmethod inEvaluationErrorHandler, enabling proper testing of error result construction.src/lightspeed_evaluation/pipeline/evaluation/errors.py (3)
68-89: LGTM! Effective refactoring using the centralized error factory.The refactoring successfully eliminates code duplication by delegating ERROR result construction to
create_error_result, improving maintainability and consistency.
120-129: LGTM! Concise list comprehension using the error factory.The refactoring uses a clean list comprehension with
create_error_result, reducing code duplication while maintaining clarity.
159-181: LGTM! Consistent refactoring pattern applied to cascade failure.The refactoring consistently applies the
create_error_resultfactory pattern to cascade failure scenarios, maintaining the same benefits of reduced duplication and improved maintainability.src/lightspeed_evaluation/pipeline/evaluation/processor.py (10)
37-44: LGTM! Dataclass effectively reduces parameter passing.The
TurnProcessingContextdataclass successfully encapsulates per-conversation processing state, addressing the PR objective of reducing local variables. The mutableconversation_idfield appropriately tracks API state across turns.
56-88: LGTM! Well-structured refactoring improves readability.The refactored
process_conversationmethod effectively addresses the PR objective by extracting helper methods and reducing local variables. The try-finally block ensures cleanup runs regardless of outcomes.
89-106: LGTM! Clean extraction of context building logic.The method effectively encapsulates metric resolution logic, contributing to the PR objective of reducing local variables in the main processing method.
108-112: LGTM! Clear helper method for metrics check.The method provides a readable abstraction for checking whether any metrics need evaluation.
114-127: LGTM! Proper setup failure handling with cleanup.The method correctly marks all metrics as ERROR and ensures cleanup runs even after setup failure, maintaining proper resource cleanup guarantees.
129-167: LGTM! Well-orchestrated turn and conversation processing.The method provides clear orchestration of the processing flow, with appropriate early returns for API errors and proper cascade failure handling.
169-182: LGTM! Clean extraction of API processing logic.The method appropriately encapsulates API call handling and maintains conversation state through the context object.
184-221: LGTM! Proper API error handling with cascade.The method correctly implements cascade failure logic, marking the failed turn and all subsequent turns and conversation metrics as ERROR with appropriate error messages.
237-244: LGTM! Consistent use of centralized error factory.The refactoring correctly uses
error_handler.create_error_resultfor invalid turn metrics, aligning with the centralized error result construction pattern.
268-271: LGTM! Consistent use of centralized error factory.The refactoring correctly uses
error_handler.create_error_resultfor invalid conversation metrics, maintaining consistency with the error handling pattern.
Description
Refactor to eliminate too many local in processor.py and reduce code in error.py
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
Release Notes
✏️ Tip: You can customize this high-level summary in your review settings.