Unify evals interface and fix context state propagation#368
Conversation
WalkthroughThis change is a comprehensive refactor and enhancement of the evaluation subsystem. It systematically renames all scenario and suite evaluation entities from the "ScenarioEvaluator"/"EvaluationSuite" prefixes to "EvaluatorScenario"/"EvaluatorSuite" for consistency. Type annotations, method signatures, and property names are updated throughout, with Possibly related PRs
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
CLAUDE.md(1 hunks)Makefile(1 hunks)guides/AdvancedState.md(1 hunks)guides/BasicEvaluation.md(5 hunks)pyproject.toml(1 hunks)src/draive/commons/metadata.py(1 hunks)src/draive/evaluation/__init__.py(1 hunks)src/draive/evaluation/evaluator.py(22 hunks)src/draive/evaluation/scenario.py(13 hunks)src/draive/evaluation/score.py(2 hunks)src/draive/evaluation/suite.py(24 hunks)src/draive/evaluation/value.py(4 hunks)src/draive/guardrails/quality/state.py(2 hunks)src/draive/helpers/instruction_refinement.py(24 hunks)src/draive/stages/stage.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- CLAUDE.md
**/__init__.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- CLAUDE.md
🧠 Learnings (8)
Makefile (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Sync dependencies with uv lock file
pyproject.toml (4)
Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.
Learnt from: KaQuMiQ
PR: miquido/draive#327
File: src/draive/helpers/instruction_preparation.py:28-34
Timestamp: 2025-05-28T17:41:57.460Z
Learning: The draive project uses and requires Python 3.12+, so PEP-695 generic syntax with square brackets (e.g., `def func[T: Type]()`) is valid and should be used instead of the older TypeVar approach.
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: draive builds on top of haiway and exports its symbols
guides/AdvancedState.md (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use Field for customizing DataModel fields with options like default_factory and aliased
src/draive/guardrails/quality/state.py (2)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use generic state classes with type parameters for reusable data structures
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use class method interfaces to access functions within context in State classes
CLAUDE.md (4)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to tests/**/*.py : Tests are in tests/ directory and use pytest with async support
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to **/*.py : Use base and abstract types like Sequence or Iterable instead of concrete types
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to tests/**/*.py : Use pytest.mark.asyncio for async test functions
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to **/*.py : Use custom exceptions for specific errors
src/draive/evaluation/__init__.py (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
src/draive/helpers/instruction_refinement.py (5)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use class method interfaces to access functions within context in State classes
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use generic state classes with type parameters for reusable data structures
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.{py} : Immutable updates through copy, same for State, Config and DataModel
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.{py} : ALWAYS use Sequence[T] instead of list[T], Mapping[K,V] instead of dict[K,V], and Set[T] instead of set[T] for collections in State, Config and DataModel classes
src/draive/evaluation/suite.py (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.{py} : Immutable updates through copy, same for State, Config and DataModel
🧬 Code Graph Analysis (3)
src/draive/guardrails/quality/state.py (3)
src/draive/evaluation/scenario.py (4)
evaluation(466-476)EvaluatorScenarioResult(119-233)PreparedEvaluatorScenario(237-249)passed(148-159)src/draive/evaluation/evaluator.py (7)
evaluation(626-636)EvaluatorResult(75-261)PreparedEvaluator(287-299)evaluator(732-735)evaluator(739-748)evaluator(751-816)passed(151-160)src/draive/guardrails/quality/types.py (1)
GuardrailsQualityException(12-23)
src/draive/evaluation/scenario.py (2)
src/draive/commons/metadata.py (4)
Meta(23-437)description(246-252)name(225-231)merged_with(332-345)src/draive/evaluation/evaluator.py (15)
evaluation(626-636)evaluator(732-735)evaluator(739-748)evaluator(751-816)EvaluatorResult(75-261)PreparedEvaluator(287-299)passed(151-160)report(162-193)performance(196-208)evaluate(343-361)evaluate(387-405)evaluate(589-596)with_state(513-540)contra_map(600-644)_evaluate(673-709)
src/draive/evaluation/evaluator.py (4)
src/draive/commons/metadata.py (5)
Meta(23-437)of(50-62)description(246-252)merged_with(332-345)name(225-231)src/draive/evaluation/score.py (2)
EvaluationScore(13-213)of(30-55)src/draive/evaluation/value.py (2)
evaluation_score_value(32-94)evaluation_score_verifier(97-116)src/draive/parameters/model.py (7)
DataModel(386-756)Field(46-54)Field(58-66)Field(70-78)Field(82-90)Field(93-126)default(760-774)
🪛 LanguageTool
guides/BasicEvaluation.md
[grammar] ~201-~201: Use correct spacing
Context: ...lt.performance:.2f}%") ``` ## Advanced Usage You can customize evaluators with execut...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~203-~203: There might be a mistake here.
Context: ... evaluators with execution contexts and metadata: python # Create evaluator with custom metadata custom_evaluator = keyword_evaluator.with_meta({ "version": "1.0", "author": "evaluation_team", }) # Combine evaluators using logical operations best_evaluator = Evaluator.highest( conciseness_evaluator.prepared(reference=reference_text), readability_evaluator.prepared(), ) # Map evaluator to work with different data structures from draive.parameters import DataModel class DocumentContent(DataModel): title: str body: str document_evaluator = readability_evaluator.contra_map( lambda doc: doc.body # Extract body text for evaluation ) The evaluation...
(QB_NEW_EN_OTHER)
🔇 Additional comments (60)
guides/AdvancedState.md (1)
183-186: Immutable default choice looks correctSwitching the
default_factoryfromlisttotupleprevents accidental in-place mutation of a shared default object and aligns with the new “immutable-by-default” guideline described in the PR.src/draive/commons/metadata.py (1)
334-334: Well-designed parameter type extension.Adding
| Noneto the parameter type allows for more flexible metadata handling while maintaining backward compatibility. The implementation correctly handlesNoneby returningselfearly, avoiding unnecessary object creation.Makefile (1)
13-13: Version alignment with build system.The UV_VERSION update to 0.8.0 correctly aligns with the pyproject.toml changes where the build backend was updated to use uv_build >=0.8.0,<0.9.0. This ensures consistency between development tooling and build system requirements.
pyproject.toml (2)
8-8: Version bump aligns with feature changes.The version increment to 0.79.0 appropriately reflects the comprehensive evaluation interface refactoring and build system changes in this release.
2-3: Action Required: Verify uv_build CLI and build artifactsThe test script failed because the
uvcommand wasn’t found in the sandbox, so we can’t confirm the new backend produces artifacts as expected. Please manually verify that:
- Installing the project exposes the
uvCLI (e.g.,pip install .or equivalent).- You can invoke the build backend, either via the
uventry point or usingpython -m uv_build build.- The
dist/directory is populated with the built artifacts.CLAUDE.md (1)
48-84: Excellent documentation style guidelines.The new NumPy docstring convention guidelines are comprehensive and well-structured. The example demonstrates proper use of Python 3.12+ type syntax (
|unions) and includes all essential sections (Parameters, Returns, Raises). This will significantly improve code documentation consistency across the project.src/draive/guardrails/quality/state.py (3)
5-10: Import updates align with evaluation interface refactoring.The consolidated import from
draive.evaluationreflects the systematic renaming from "ScenarioEvaluator" to "EvaluatorScenario" pattern. The updated class names (PreparedEvaluatorScenario,EvaluatorScenarioResult) are consistent with the unified evaluation interface.
29-29: Parameter type correctly updated.The parameter type change from
PreparedScenarioEvaluatortoPreparedEvaluatorScenariomaintains the union withPreparedEvaluatorwhile following the new naming convention.
40-52: Improved type discrimination and exception handling.The replacement of pattern matching with explicit
isinstancechecks provides clearer type discrimination. The exception handling correctly uses the new result properties (result.evaluatorandresult.scenario) and properly propagates metadata.src/draive/evaluation/value.py (4)
20-20: LGTM! Boolean support for evaluation scores.The addition of
boolto theEvaluationScoreValuetype union is a logical enhancement that makes the API more intuitive for binary pass/fail evaluations.
32-60: Excellent comprehensive documentation.The NumPy-style docstring provides clear parameter descriptions, return values, and exception handling information, significantly improving the API's usability.
62-71: Correct boolean handling implementation.The boolean pattern matching correctly maps
Falseto0andTrueto1, following standard boolean-to-numeric conversion conventions.
97-116: Well-implemented validation function.The
evaluation_score_verifierprovides clear error messages and proper range validation. The separation of concerns between value conversion and validation is good design.guides/BasicEvaluation.md (4)
23-26: LGTM! Consistent API updates.The documentation correctly reflects the new
EvaluationScore.of()class method pattern, replacing the direct constructor calls.Also applies to: 29-32
104-104: API naming consistency maintained.The updates from
evaluation_scenariotoevaluator_scenarioand corresponding result types are consistent with the broader refactoring effort.Also applies to: 107-107
132-132: Correct property name updates.The change from
relative_scoretoperformancewith percentage formatting (.2f}%) correctly reflects the new API semantics.Also applies to: 198-198
206-210: Updated context management pattern.The documentation correctly shows the transition from
with_execution_contexttowith_metaandwith_stateusage, reflecting the new State-based context management.src/draive/stages/stage.py (5)
20-23: LGTM! Import updates consistent with refactoring.The import changes correctly reflect the evaluation API renaming from
ScenarioEvaluatorResulttoEvaluatorScenarioResultandPreparedScenarioEvaluatortoPreparedEvaluatorScenario.
847-848: Correct type annotation updates.All type annotations have been systematically updated to use the new evaluation API names, maintaining type safety while following the new naming conventions.
Also applies to: 861-862, 908-908, 921-921
889-890: Property name update correctly implemented.The change from
relative_scoretoperformanceis correctly implemented and maintains the same functionality with clearer semantics (percentage vs fraction).Also applies to: 948-949
890-890: Method parameter name updated.The
reportmethod call parameter is correctly updated frominclude_detailstodetailed, following the new API signature.Also applies to: 949-949
892-896: Error message terminology updated.The error messages and metadata keys have been appropriately updated to use "performance" instead of "relative score" and "evaluation_performance" instead of "evaluation_score".
Also applies to: 951-956
src/draive/evaluation/score.py (5)
3-7: Good import consolidation and code reuse.The import of
evaluation_score_verifierfrom the value module promotes code reuse and centralizes validation logic.
13-27: Excellent comprehensive class documentation.The NumPy-style docstring provides clear class description, attributes documentation, and usage context, significantly improving API usability.
59-59: Centralized validation implementation.Using the imported
evaluation_score_verifierensures consistent validation across the evaluation system.
80-87: Improved comparison method implementation.The explicit type checking with
isinstanceand returningFalsefor unsupported types is more explicit and robust than theNotImplementedpattern, though both are valid approaches.Also applies to: 103-110, 126-133, 149-156, 172-179, 195-202
66-79: Comprehensive method documentation.All comparison and hash methods now have detailed NumPy-style docstrings that clearly describe parameters, return values, and behavior.
Also applies to: 89-102, 112-125, 135-148, 158-171, 181-194, 204-213
src/draive/evaluation/__init__.py (4)
11-15: Systematic scenario API renaming.The import updates correctly reflect the comprehensive renaming from
ScenarioEvaluator*toEvaluatorScenario*pattern, maintaining consistency across the evaluation API.
19-27: Consistent suite API renaming.The suite-related imports are systematically updated from
EvaluationSuite*toEvaluatorSuite*pattern, aligning with the unified naming convention.
35-51: Complete public API updates.The
__all__tuple correctly exports all the renamed entities while maintaining the existing core evaluator exports, ensuring the public API reflects the new naming conventions.
31-31: Backward compatibility maintained.The retention of
EvaluationScenarioResultin the exports suggests intentional backward compatibility, which is good practice during API transitions.src/draive/helpers/instruction_refinement.py (7)
8-9: LGTM! Import updates align with the evaluation API renaming.The imports correctly reflect the systematic renaming from
EvaluationSuite*toEvaluatorSuite*across the evaluation subsystem.
26-26: Type annotation correctly updated.The parameter type has been properly updated to use the new
EvaluatorSuitetype.
128-129: Class attributes correctly updated to new evaluation types.The
focused_evaluationandcomplete_evaluationattributes now use the renamedEvaluatorSuiteResulttype.
142-150: Properties correctly renamed to useperformanceinstead ofrelative_score.The property names and their implementations have been updated to match the new API where
performancereturns a percentage value.
236-236: Logging format correctly updated for percentage display.The format has been appropriately changed from 4 decimal places to 2, which makes sense since
performancenow represents a percentage (0-100) rather than a normalized score (0-1).Also applies to: 379-379, 436-436
460-461: Report method calls correctly updated withdetailedparameter.The calls now explicitly specify
detailed=Trueto get full XML-formatted reports, which aligns with the enhanced reporting capabilities in the new API.Also applies to: 536-537
407-408: Performance calculations correctly updated throughout.All references to performance metrics have been properly updated to use the new
performanceproperty instead ofrelative_score.Also applies to: 657-658, 683-690, 724-724, 729-729, 757-757
src/draive/evaluation/scenario.py (8)
1-1: Import updates align with State-based context management.The imports correctly add
CollectionandStateto support the new execution context management approach.Also applies to: 4-4
19-117: Well-designedEvaluationScenarioResultclass with comprehensive documentation.The new class provides a clean interface for aggregating multiple evaluator results with proper async evaluation support and result merging capabilities. The NumPy-style docstrings are thorough and follow the project's documentation standards.
119-234: Class properly renamed with enhanced reporting and performance calculation.The
EvaluatorScenarioResultclass has been well refactored with:
- Proper renaming following the new convention
- Performance as a percentage (0-100)
- Enhanced reporting with
detailedflag- Comprehensive NumPy-style docstrings
Note: The
passedproperty correctly returnsFalsefor empty evaluations, maintaining defensive programming practices.
237-271: Protocols correctly renamed with clear documentation.The protocols have been properly updated to follow the new naming convention and include helpful docstrings explaining their purpose.
273-413: Class successfully migrated to State-based context management.The
EvaluatorScenarioclass has been properly refactored with:
- State collection replacing execution context
- Enhanced
with_statemethod accepting multiple states- Comprehensive docstrings for all methods
- Consistent naming throughout
440-483: Improvedcontra_mapimplementation with clearer type checking.The method now properly distinguishes between
AttributePathandCallabletypes usingisinstance, making the code more explicit and maintainable.
485-555: Excellent error handling and result normalization.The implementation now:
- Properly scopes execution with all states
- Gracefully handles exceptions by returning empty results with error metadata
- Cleanly normalizes both
EvaluationScenarioResultand sequence ofEvaluatorResulttypes
577-657: Factory function properly updated with excellent documentation.The
evaluator_scenariofunction has been correctly refactored with:
- State-based configuration replacing execution context
- Clear overload definitions
- Comprehensive docstrings with practical examples
src/draive/evaluation/evaluator.py (7)
2-2: Imports properly organized and support new functionality.The imports follow the correct ordering (standard library, third party, local) and add necessary components for state-based context management and score verification.
Also applies to: 5-5, 9-13
25-73: Well-designedEvaluationResultclass for encapsulating scores with metadata.The class provides a clean abstraction for evaluation results with flexible construction via the
ofclass method and proper field definitions.
75-262: Comprehensive enhancements toEvaluatorResultclass.The class has been significantly improved with:
- Flexible score input accepting
EvaluationResult,EvaluationScore, or raw values- Performance as a percentage (0-100) with proper edge case handling
- Enhanced reporting with brief/detailed options
- Robust comparison methods with proper validation
322-407: Useful static methods for evaluator selection based on performance.The
lowestandhighestmethods provide convenient ways to run multiple evaluators concurrently and select based on performance. The placeholder results are cleverly designed to ensure the first real result will always replace them.
411-540: Evaluator class successfully migrated to State-based context management.The class has been properly refactored with:
- State collection replacing execution context
- Enhanced
with_statemethod accepting multiple stateswith_thresholdusing proper score value conversion- Comprehensive docstrings for all methods
600-709: Excellent improvements to evaluation and error handling.The implementation now:
- Uses clearer type checking in
contra_map- Properly scopes evaluation with all states
- Records comprehensive metrics including performance percentage
- Gracefully handles exceptions with detailed error metadata
731-816: Factory function properly updated with state support.The
evaluatorfunction has been correctly refactored with:
- State-based configuration replacing execution context
- Proper threshold value conversion
- Excellent documentation with usage examples
src/draive/evaluation/suite.py (7)
3-3: Import updates support new state management and renamed types.The imports correctly add
CollectionandStatewhile updating to use the renamedEvaluatorScenarioResult.Also applies to: 8-8, 13-13
27-47: Class properly renamed with flexible identifier type.The
EvaluatorSuiteCaseclass has been updated with:
- Proper renaming following the new convention
- Flexible string identifier (defaults to UUID string)
- Comprehensive NumPy-style docstrings
49-221: Suite result classes properly enhanced with improved reporting.Both
EvaluatorSuiteCaseResultandEvaluatorSuiteResulthave been successfully refactored with:
- Performance as percentage (0-100)
- Enhanced reporting with brief/detailed options
- Improved XML formatting using attributes
- Comprehensive documentation
223-302:EvaluatorCaseResultclass properly refactored with clearer type handling.The
ofmethod now:
- Accepts a more intuitive parameter pattern (single result + variadic)
- Uses explicit type checks for clarity
- Properly aggregates both scenario and individual evaluator results
333-493:EvaluatorSuiteclass successfully migrated to State-based context.The class has been properly refactored with:
- State collection replacing execution context
- Proper scoping in
__call__method- Enhanced
with_stateaccepting multiple states- Consistent renaming throughout
689-780: Storage classes properly renamed with updated type annotations.Both
_EvaluatorSuiteMemoryStorageand_EvaluatorSuiteFileStoragehave been correctly updated to use the new naming convention and type annotations.
608-686: Factory function properly updated with state support.The
evaluator_suitefunction has been correctly refactored with:
- State-based configuration replacing execution context
- Proper handling of all storage options
- Consistent type annotations throughout
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (3)
guides/AdvancedState.md (1)
185-185: Clarify that the default factory now returns a tupleThe change from
listtotuplefor the default factory is correct for immutable state management. However, the documentation should clarify that this field now returns a tuple when accessed, not a list, and that an empty tuple prints as a blank line in the pretty-print output to avoid confusion for readers.guides/BasicEvaluation.md (1)
198-198: Fix spacing issue flagged by static analysisThere's a spacing issue that needs to be corrected.
src/draive/helpers/instruction_refinement.py (1)
178-203: Consider usingSequenceinstead oflistfor consistency.Local variables should use
Sequencetype annotations to maintain consistency with the codebase patterns.- failing_cases: list[EvaluatorSuiteCase[CaseParameters]] = [ + failing_cases: Sequence[EvaluatorSuiteCase[CaseParameters]] = [ case_result.case for case_result in evaluation_result.cases if not case_result.passed ] # Get passing cases and sample - passing_cases: list[EvaluatorSuiteCase[CaseParameters]] = [ + passing_cases: Sequence[EvaluatorSuiteCase[CaseParameters]] = [ case_result.case for case_result in evaluation_result.cases if case_result.passed ] # Get other, previously excluded cases - additional_cases: list[EvaluatorSuiteCase[CaseParameters]] = [ + additional_cases: Sequence[EvaluatorSuiteCase[CaseParameters]] = [ case for case in evaluation_cases if case not in evaluation_result.cases ] # Intelligent sampling: sample some passing cases - sampling_cases_pool: list[EvaluatorSuiteCase[CaseParameters]] = ( + sampling_cases_pool: Sequence[EvaluatorSuiteCase[CaseParameters]] = ( passing_cases + additional_cases ) sample_size: int = ( max(1, int(len(sampling_cases_pool) * sample_ratio)) if sampling_cases_pool else 0 ) - sampling_cases: list[EvaluatorSuiteCase[CaseParameters]] = ( + sampling_cases: Sequence[EvaluatorSuiteCase[CaseParameters]] = (
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (15)
CLAUDE.md(1 hunks)Makefile(1 hunks)guides/AdvancedState.md(1 hunks)guides/BasicEvaluation.md(5 hunks)pyproject.toml(1 hunks)src/draive/commons/metadata.py(1 hunks)src/draive/evaluation/__init__.py(1 hunks)src/draive/evaluation/evaluator.py(22 hunks)src/draive/evaluation/scenario.py(13 hunks)src/draive/evaluation/score.py(2 hunks)src/draive/evaluation/suite.py(24 hunks)src/draive/evaluation/value.py(4 hunks)src/draive/guardrails/quality/state.py(2 hunks)src/draive/helpers/instruction_refinement.py(24 hunks)src/draive/stages/stage.py(7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- CLAUDE.md
**/__init__.py
Instructions used from:
Sources:
📄 CodeRabbit Inference Engine
- CLAUDE.md
🧠 Learnings (8)
Makefile (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Sync dependencies with uv lock file
CLAUDE.md (4)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to tests/**/*.py : Tests are in tests/ directory and use pytest with async support
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to **/*.py : Use base and abstract types like Sequence or Iterable instead of concrete types
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to tests/**/*.py : Use pytest.mark.asyncio for async test functions
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to **/*.py : Use custom exceptions for specific errors
pyproject.toml (4)
Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.
Learnt from: KaQuMiQ
PR: miquido/draive#327
File: src/draive/helpers/instruction_preparation.py:28-34
Timestamp: 2025-05-28T17:41:57.460Z
Learning: The draive project uses and requires Python 3.12+, so PEP-695 generic syntax with square brackets (e.g., `def func[T: Type]()`) is valid and should be used instead of the older TypeVar approach.
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: draive builds on top of haiway and exports its symbols
guides/AdvancedState.md (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use Field for customizing DataModel fields with options like default_factory and aliased
src/draive/guardrails/quality/state.py (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use class method interfaces to access functions within context in State classes
src/draive/evaluation/__init__.py (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
src/draive/helpers/instruction_refinement.py (6)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.{py} : ALWAYS use Sequence[T] instead of list[T], Mapping[K,V] instead of dict[K,V], and Set[T] instead of set[T] for collections in State, Config and DataModel classes
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to **/*.py : Use base and abstract types like Sequence or Iterable instead of concrete types
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use class method interfaces to access functions within context in State classes
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use generic state classes with type parameters for reusable data structures
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.{py} : Immutable updates through copy, same for State, Config and DataModel
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
src/draive/evaluation/suite.py (2)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.{py} : Immutable updates through copy, same for State, Config and DataModel
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use class method interfaces to access functions within context in State classes
🧬 Code Graph Analysis (4)
src/draive/guardrails/quality/state.py (4)
src/draive/evaluation/scenario.py (4)
evaluation(466-476)EvaluatorScenarioResult(119-233)PreparedEvaluatorScenario(237-249)passed(148-159)src/draive/evaluation/evaluator.py (7)
evaluation(626-636)EvaluatorResult(75-261)PreparedEvaluator(287-299)evaluator(732-735)evaluator(739-748)evaluator(751-816)passed(151-160)src/draive/multimodal/content.py (1)
MultimodalContent(23-235)src/draive/guardrails/quality/types.py (1)
GuardrailsQualityException(12-23)
src/draive/evaluation/__init__.py (3)
src/draive/evaluation/scenario.py (8)
EvaluatorScenario(273-574)EvaluatorScenarioDefinition(253-270)EvaluatorScenarioResult(119-233)PreparedEvaluatorScenario(237-249)evaluator_scenario(578-581)evaluator_scenario(585-593)evaluator_scenario(596-657)evaluation(466-476)src/draive/evaluation/score.py (1)
EvaluationScore(13-213)src/draive/evaluation/suite.py (8)
EvaluatorCaseResult(223-301)EvaluatorSuite(333-605)EvaluatorSuiteCase(27-46)EvaluatorSuiteCaseResult(49-163)EvaluatorSuiteDefinition(305-313)EvaluatorSuiteResult(166-220)EvaluatorSuiteStorage(322-330)evaluator_suite(608-686)
src/draive/helpers/instruction_refinement.py (2)
src/draive/evaluation/evaluator.py (4)
evaluation(626-636)performance(196-208)passed(151-160)report(162-193)src/draive/evaluation/suite.py (10)
EvaluatorSuite(333-605)EvaluatorSuiteResult(166-220)EvaluatorSuiteCase(27-46)performance(146-163)performance(212-220)cases(535-540)passed(78-89)passed(171-172)report(91-143)report(174-209)
src/draive/evaluation/scenario.py (3)
src/draive/commons/metadata.py (4)
Meta(23-437)description(246-252)name(225-231)merged_with(332-345)src/draive/evaluation/evaluator.py (15)
evaluation(626-636)evaluator(732-735)evaluator(739-748)evaluator(751-816)EvaluatorResult(75-261)PreparedEvaluator(287-299)passed(151-160)report(162-193)performance(196-208)evaluate(343-361)evaluate(387-405)evaluate(589-596)with_state(513-540)contra_map(600-644)_evaluate(673-709)src/draive/evaluation/suite.py (8)
passed(78-89)passed(171-172)report(91-143)report(174-209)performance(146-163)performance(212-220)with_state(481-493)_evaluate(382-452)
🪛 LanguageTool
guides/BasicEvaluation.md
[grammar] ~201-~201: Use correct spacing
Context: ...lt.performance:.2f}%") ``` ## Advanced Usage You can customize evaluators with execut...
(QB_NEW_EN_OTHER_ERROR_IDS_5)
[grammar] ~203-~203: There might be a mistake here.
Context: ... evaluators with execution contexts and metadata: python # Create evaluator with custom metadata custom_evaluator = keyword_evaluator.with_meta({ "version": "1.0", "author": "evaluation_team", }) # Combine evaluators using logical operations best_evaluator = Evaluator.highest( conciseness_evaluator.prepared(reference=reference_text), readability_evaluator.prepared(), ) # Map evaluator to work with different data structures from draive.parameters import DataModel class DocumentContent(DataModel): title: str body: str document_evaluator = readability_evaluator.contra_map( lambda doc: doc.body # Extract body text for evaluation ) The evaluation...
(QB_NEW_EN_OTHER)
🔇 Additional comments (78)
pyproject.toml (2)
8-8: Ensure version bump is mirrored in runtime package
version = "0.79.0"bumps the library, butdraive/__init__.py(or wherever__version__is declared) must be updated in lock-step, otherwise tools likeimportlib.metadata.version("draive")will still report the old value.
2-3: Verifyuv_buildBackend & Clean Up Hatch MetadataPlease double-check and address the following before merging:
Confirm that the PyPI package exposes the
uv_buildbuild-backend in the pinned range (>=0.8.0,<0.9.0). For example:pip install "uv_build>=0.8.0,<0.9.0" --dry-run # or pip download "uv>=0.8.0,<0.9.0" --no-depsto ensure the wheel includes the
uv_buildentry-point and avoid “backend not found” errors.Remove the now-orphaned Hatch metadata section in
pyproject.toml(lines 94–96):-[tool.hatch.metadata] -allow-direct-references = true # Hatchling backend removed → drop obsolete sectionCLAUDE.md (1)
48-84: Excellent documentation style guidelinesThe new documentation style section provides clear, comprehensive guidance for NumPy docstring conventions. The example demonstrates proper formatting for parameters, returns, and exceptions, which will ensure consistency across the codebase. This aligns well with the enhanced docstring coverage mentioned in the evaluation modules.
src/draive/commons/metadata.py (1)
334-334: Good improvement to metadata handlingThe updated type annotation to include
Noneis correct and makes the API more flexible. The early return optimization whenvaluesis falsy is efficient and avoids unnecessary copying. This change supports the optional metadata patterns used throughout the evaluation components.src/draive/evaluation/value.py (3)
20-21: Good addition of boolean supportAdding boolean support to
EvaluationScoreValueis logical and makes the API more intuitive. The type annotation correctly reflects the new capability.
32-95: Excellent improvements to evaluation_score_value functionThe enhancements are well-implemented:
- Boolean support (True→1.0, False→0.0) is intuitive and useful
- Pattern matching order is logical (float first with assertion for range validation)
- Comprehensive NumPy-style docstring follows the new documentation guidelines
- Error handling is appropriate with clear error messages
97-117: Well-implemented validation functionThe new
evaluation_score_verifierfunction provides clean, reusable validation logic. The docstring follows NumPy conventions and the implementation is straightforward and correct. This supports the Field validation patterns used in the evaluation system.src/draive/guardrails/quality/state.py (3)
5-10: Good import consolidationThe consolidated import from
draive.evaluationimproves readability and reflects the unified evaluation interface. The type annotations are correctly updated to use the new API names (PreparedEvaluatorScenario,EvaluatorScenarioResult).
29-29: Type annotation correctly updatedThe parameter type annotation properly reflects the new evaluation API, supporting both
PreparedEvaluatorScenarioandPreparedEvaluatortypes.
36-52: Cleaner error handling logicThe replacement of pattern matching with
isinstancecheck is simpler and more readable while maintaining the same functionality. The error handling correctly extracts the appropriate reason (result.evaluatorforEvaluatorResult,result.scenariofor scenario results) and preserves metadata propagation.guides/BasicEvaluation.md (9)
23-26: LGTM: Correct usage of new EvaluationScore.of() methodThe documentation correctly demonstrates the new class method
EvaluationScore.of()instead of direct constructor calls, which aligns with the updated API.
29-32: LGTM: Consistent usage of EvaluationScore.of() methodThe second example correctly uses the new
EvaluationScore.of()method, maintaining consistency with the updated API.
104-104: LGTM: Correct import update for evaluator_scenarioThe import statement correctly uses the new
evaluator_scenariodecorator name, replacing the oldevaluation_scenario.
107-107: LGTM: Correct decorator usageThe decorator correctly uses
@evaluator_scenarioinstead of the old@evaluation_scenarionaming.
132-132: LGTM: Correct property name updateThe documentation correctly shows the new
performanceproperty instead of the oldrelative_score, and properly formats it as a percentage.
144-144: LGTM: Correct import update for suite classesThe import statement correctly uses the new
EvaluatorCaseResultclass name, replacing the old naming convention.
152-152: LGTM: Correct suite decorator usageThe decorator correctly uses
@evaluator_suitewith the updated parameter names and signature.
155-156: LGTM: Correct parameter namingThe function signature correctly uses
case_parametersinstead of the oldparametersname, improving clarity.
207-210: LGTM: Correct usage of with_meta methodThe documentation correctly shows the new
with_metamethod for adding metadata to evaluators, replacing the oldwith_execution_contextpattern.src/draive/stages/stage.py (11)
20-23: LGTM: Correct type annotation updatesThe import statements correctly use the new
EvaluatorScenarioResultandPreparedEvaluatorScenariotype names, aligning with the evaluation API refactoring.
847-848: LGTM: Correct type annotation updateThe parameter type annotation correctly uses
PreparedEvaluatorScenarioinstead of the oldPreparedScenarioEvaluatornaming.
861-862: LGTM: Consistent docstring updateThe docstring correctly reflects the new
PreparedEvaluatorScenariotype name in the parameter documentation.
882-884: LGTM: Correct result type annotationThe variable annotation correctly uses
EvaluatorScenarioResult | EvaluatorResultwith the new naming convention.
889-890: LGTM: Correct property and method updatesThe code correctly uses the new
performanceproperty instead ofrelative_scoreand updates thereportmethod call to usedetailedparameter instead ofinclude_details.
892-898: LGTM: Correct exception message and metadata updatesThe exception message correctly references "performance" instead of "relative score", and the metadata key is properly updated to
"evaluation_performance".
908-908: LGTM: Correct type annotation updateThe parameter type annotation correctly uses
PreparedEvaluatorScenarioinstead of the old naming convention.
921-921: LGTM: Consistent docstring updateThe docstring correctly reflects the new
PreparedEvaluatorScenariotype name in the parameter documentation.
941-943: LGTM: Correct result type annotationThe variable annotation correctly uses the new
EvaluatorScenarioResult | EvaluatorResulttype names.
948-950: LGTM: Correct property and method updatesThe code correctly uses the new
performanceproperty and updates thereportmethod call parameter.
951-957: LGTM: Correct exception message and metadata updatesThe exception message and metadata key are correctly updated to reference "performance" instead of "relative score".
src/draive/evaluation/__init__.py (4)
11-15: LGTM: Correct import updates for evaluator scenarioThe imports correctly use the new naming convention with
EvaluatorScenario,EvaluatorScenarioDefinition,PreparedEvaluatorScenario, andevaluator_scenarioreplacing the old names.
19-27: LGTM: Correct import updates for evaluator suiteThe imports correctly use the new naming convention with all suite-related classes properly renamed to use the
Evaluatorprefix.
35-53: LGTM: Correct export updatesThe
__all__exports correctly use the new naming convention for all evaluator-related classes and functions.
10-16: No import inconsistency: bothEvaluationScenarioResultandEvaluatorScenarioResultare valid and intentionally exportedThe
scenario.pymodule defines two distinct classes—
EvaluationScenarioResult(for results of evaluating multiple evaluators on a value)EvaluatorScenarioResult(for results of running a named scenario)Both are correctly imported in
src/draive/evaluation/__init__.pyand listed in__all__. No changes required here.Likely an incorrect or invalid review comment.
src/draive/evaluation/score.py (11)
3-7: LGTM: Correct imports for validationThe imports correctly use the new evaluation value types and verifier from the value module.
13-27: LGTM: Comprehensive class documentationThe class docstring provides clear and comprehensive documentation following NumPy style, explaining the purpose, attributes, and behavior of the class.
29-55: LGTM: Well-documented class methodThe
ofclass method is well-documented with comprehensive docstrings and provides a clean factory interface for creating EvaluationScore instances.
57-64: LGTM: Proper field definitions with validationThe field definitions correctly use the
evaluation_score_verifierfor validation and include descriptive field documentation.
66-87: LGTM: Correct equality implementationThe
__eq__method correctly handles both float and EvaluationScore comparisons with proper type checking and returns False for unsupported types.
89-110: LGTM: Correct inequality implementationThe
__ne__method correctly implements inequality comparison with proper type checking and documentation.
112-133: LGTM: Correct less-than implementationThe
__lt__method correctly implements less-than comparison with proper type checking and documentation.
135-156: LGTM: Correct less-than-or-equal implementationThe
__le__method correctly implements less-than-or-equal comparison with proper type checking and documentation.
158-179: LGTM: Correct greater-than implementationThe
__gt__method correctly implements greater-than comparison with proper type checking and documentation.
181-202: LGTM: Correct greater-than-or-equal implementationThe
__ge__method correctly implements greater-than-or-equal comparison with proper type checking and documentation.
204-213: LGTM: Correct hash implementationThe
__hash__method correctly implements hashing based on the value and comment tuple, with proper documentation.src/draive/helpers/instruction_refinement.py (8)
8-9: LGTM!Import updates correctly reflect the renaming from
EvaluationSuitetoEvaluatorSuiteand related types.
26-26: Type annotation correctly updated.
128-150: Property and type updates are consistent with the new API.The changes correctly update:
- Type annotations from
SuiteEvaluatorResulttoEvaluatorSuiteResult- Property names from
*_scoreto*_performance- Property access from
.relative_scoreto.performance
221-238: Initialization stage correctly updated with new types.The changes appropriately update type annotations and logging to use the new
EvaluatorSuitetype andperformanceproperty.
276-351: Tree exploration correctly implements new API.Updates properly use
EvaluatorSuitetype and accessperformanceproperties consistently.
372-438: Node exploration logic properly updated.The function correctly uses new types (
EvaluatorSuite,EvaluatorSuiteCase) andperformanceproperty throughout.
450-538: Report generation updated with new API.The changes correctly:
- Update type annotations to
EvaluatorSuiteResult- Add
detailed=Trueparameter toreport()method calls
631-759: Tree finalization correctly implements performance metrics.The function properly uses:
EvaluatorSuitetypeperformanceproperty instead ofrelative_score- Consistent decimal formatting (2-4 places) for performance values
src/draive/evaluation/scenario.py (8)
1-16: Import and export updates align with new architecture.The changes correctly:
- Import
Stateinstead ofScopeContextfor context management- Update exports to use
Evaluator*prefix consistently
19-117: Well-designed aggregation class for evaluation results.The
EvaluationScenarioResultclass provides a clean interface for:
- Running multiple evaluators concurrently
- Merging results from multiple scenarios
- Proper metadata handling
119-234: Comprehensive scenario result implementation.The
EvaluatorScenarioResultclass provides:
- Clear pass/fail logic with empty evaluation handling
- Flexible reporting with
detailedandinclude_passedoptions- Performance calculation as average percentage (0-100)
- Well-formatted XML output for detailed reports
289-329: Constructor properly updated for state management.The initialization correctly:
- Accepts
Collection[State]instead of execution context- Stores state in
_stateattribute- Maintains immutability pattern
440-484: Enhanced contra_map implementation.The method now properly handles:
AttributePathfor attribute-based transformations- Type casting with proper assertions
- Clear parameter documentation
485-511: Consistent scoped execution in call method.The implementation ensures:
- All evaluations run within proper scope including state
- Metrics recorded with
performancekey- Proper attribute tracking for passed status
512-556: Robust error handling in _evaluate method.The method properly:
- Catches and logs exceptions
- Returns empty result with error metadata on failure
- Normalizes both
EvaluationScenarioResultand sequence results- Preserves metadata correctly
577-658: Well-documented decorator with state management.The
evaluator_scenariofunction provides:
- Clear parameter documentation
- Usage examples for both decorator and direct call patterns
- Proper state collection handling
- Type-safe overloads
src/draive/evaluation/evaluator.py (9)
25-73: Well-structured EvaluationResult wrapper class.The new class provides:
- Factory method for creating results from scores or values
- Metadata support for additional context
- Clean integration with EvaluationScore
75-133: Enhanced EvaluatorResult with flexible score handling.The factory method now properly handles:
EvaluationResultwith metadata merging- Direct
EvaluationScoreobjects- Raw score values with automatic wrapping
162-209: Improved reporting and performance calculation.The enhancements provide:
detailedparameter for flexible report formatting- Performance as percentage (0-100) instead of fraction
- Proper handling of zero threshold edge case
- Clear XML formatting for detailed reports
210-259: Comparison operators properly validate compatibility.The operators now ensure results are comparable by checking:
- Same evaluator name
- Same threshold value
This prevents invalid comparisons between different evaluation contexts.
321-407: Static methods updated to use performance metric.The
lowestandhighestmethods now:
- Accept variadic evaluators with cleaner API
- Compare based on
performancepercentage- Run evaluations concurrently
- Return the evaluator result with best performance
600-644: Enhanced contra_map with AttributePath support.The method improvements include:
- Support for
AttributePathtransformations- Proper type assertions and casting
- Clear documentation
- Consistent with similar implementation in EvaluatorScenario
646-672: Metrics recording updated to use performance.The call method now:
- Records
performancemetric instead of raw score- Includes comprehensive attributes in metrics
- Maintains scoped execution with state
673-709: Simplified evaluation with comprehensive error handling.The method now:
- Returns
EvaluationResultwith error metadata on exceptions- Uses consistent error comment "Error"
- Properly wraps all result types
731-817: Decorator updated with state management and documentation.The
evaluatorfunction now:
- Accepts
statecollection instead of execution context- Includes comprehensive docstrings with examples
- Validates threshold using
evaluation_score_value- Maintains backward compatibility
src/draive/evaluation/suite.py (8)
3-24: Import and export updates align with new naming.The changes correctly update all exports to use the
Evaluatorprefix consistently.
27-164: Well-documented test case data models.The classes provide:
- Clear documentation for attributes
performanceproperty calculating average percentage- Flexible reporting with
detailedandinclude_passedoptions- Proper handling of empty results
166-221: Suite result with comprehensive reporting.The class provides:
- Average performance calculation across all cases
- Flexible reporting with XML and summary formats
- Proper empty suite handling
- Consistent with case result reporting
223-303: Flexible case result aggregation.The
EvaluatorCaseResultclass properly:
- Handles both
EvaluatorScenarioResultandEvaluatorResulttypes- Uses explicit type checking for clarity
- Maintains metadata merging capabilities
347-362: Constructor properly updated for state management.The initialization correctly stores state collection and maintains consistency with other evaluator classes.
363-381: Evaluation runs within proper scope.The
__call__method ensures all evaluations run within a scoped context including the state collection.
481-493: State management method follows immutable pattern.The
with_statemethod correctly:
- Creates new instance with extended state
- Maintains immutability as per codebase conventions
- Follows pattern from other evaluator classes
608-780: Storage and factory properly updated.The changes correctly:
- Update all type annotations to use
Evaluatorprefix- Maintain state parameter in factory function
- Update storage implementations with new types
|
|
||
| __all__ = ( | ||
| "EvaluationCaseResult", | ||
| "EvaluationScenarioResult", |
There was a problem hiding this comment.
Inconsistent export naming detected
The __all__ exports include EvaluationScenarioResult (line 31) which doesn't match the import EvaluatorScenarioResult (line 13). This inconsistency could cause import errors.
🤖 Prompt for AI Agents
In src/draive/evaluation/__init__.py at line 31, the export name
"EvaluationScenarioResult" in the __all__ list does not match the imported name
"EvaluatorScenarioResult" at line 13. To fix this, update the export name in the
__all__ list to exactly match the imported name "EvaluatorScenarioResult" to
ensure consistent and error-free imports.
No description provided.