Conversation
4beec02 to
b75184c
Compare
ee8e6a8 to
85703a2
Compare
85703a2 to
6b9d43b
Compare
6b9d43b to
f7420c2
Compare
WalkthroughThe changes introduce a new asynchronous instruction refinement workflow in the Possibly related PRs
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (4)
src/draive/__init__.py (1)
120-122: Minor ordering nit – keep large__all__alphabetisedTo keep this huge symbol list maintainable, please insert the new names in alphabetical order (currently
prepare_instruction/refine_instructionsit between unrelated blocks).Also applies to: 125-126, 438-441
src/draive/helpers/instruction_preparation.py (1)
75-79: Micro-optimisation: build strings withjoinFor many arguments, repeated
+=string concatenation is quadratic. Collect parts in a list then''.join(parts).src/draive/helpers/instruction_refinement.py (2)
147-150: Equal evaluation scores are treated as regressions
current_evaluation.relative_score <= previous_evaluation.relative_scorerolls back even when the score is unchanged, which can stall the loop if the suite is deterministic.- if current_evaluation.relative_score <= previous_evaluation.relative_score: + if current_evaluation.relative_score < previous_evaluation.relative_score:Optionally add an ε-tolerance for floating-point noise.
268-271: Off-by-one risk when enforcingrounds_limitThe round counter is checked before it’s incremented, so a limit of 3 actually permits 4 iterations (0-3).
Increment first or change the predicate to>to run exactlyrounds_limitrefinements.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting
📒 Files selected for processing (5)
src/draive/__init__.py(3 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_preparation.py(1 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)src/draive/instructions/__init__.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/draive/instructions/__init__.py (1)
src/draive/instructions/types.py (1)
InstructionDeclaration(42-46)
🔇 Additional comments (3)
src/draive/instructions/__init__.py (1)
5-6: Public export looks good
InstructionDeclarationis correctly imported and re-exported – nothing else to flag.Also applies to: 14-15
src/draive/helpers/__init__.py (1)
1-2: Helper functions exposed correctly
prepare_instruction/refine_instructionare now part of the public API and added to__all__; imports resolve.Also applies to: 15-16
src/draive/helpers/instruction_refinement.py (1)
25-33: Ensure runtime environment supports PEP 695 function-level type parametersThe generic syntax
def refine_instruction[CaseParameters: DataModel, Result: DataModel | str](...)depends on PEP 695, available only from Python 3.12 onward. Importing this module on 3.10/3.11 will raise aSyntaxError.• Verify that the project’s advertised minimum Python version is ≥ 3.12, and enforce it in packaging metadata / CI.
• Otherwise, fall back toTypeVar/ParamSpecintyping_extensionsfor broader compatibility.
f7420c2 to
091fdeb
Compare
There was a problem hiding this comment.
Actionable comments posted: 6
🔭 Outside diff range comments (1)
src/draive/__init__.py (1)
101-109: 🧹 Nitpick (assertive)Mind potential duplication & ordering drift in the giant
__all__The three new symbols are registered correctly, but as the list grows (~180 items) accidental duplicates or maintenance errors become likely.
Consider generating__all__programmatically (e.g.from importlib import import_module; __all__ = tuple(sorted(globals()))) or grouping by sub-package to reduce manual editing.Also applies to: 267-268, 382-385
♻️ Duplicate comments (3)
src/draive/helpers/instruction_preparation.py (1)
16-21: Generic-function syntax still breaks ≤ Python 3.12The square-bracket generic parameters (
prepare_instruction[CaseParameters: …]) are only valid from 3.12+. Projects running 3.11 or lower will fail to import this module.If 3.12 is not a hard requirement, drop the brackets:
-async def prepare_instruction[CaseParameters: DataModel, Result: DataModel | str]( +async def prepare_instruction(src/draive/helpers/instruction_refinement.py (2)
218-218: Typo in XML tag breaks downstream processing.The tag is misspelled as
<EALUATION_RESULT>(missing "V") instead of<EVALUATION_RESULT>, which creates inconsistency and may confuse XML parsing utilities and the LLM.- "\n<EALUATION_RESULT>", + "\n<EVALUATION_RESULT>",- "</EALUATION_RESULT>", + "</EVALUATION_RESULT>",Also applies to: 223-223
227-231: Guidelines placeholder is not interpolated in REVIEW_INSTRUCTION.The
{guidelines}placeholder inREVIEW_INSTRUCTIONtemplate is never replaced, causing the LLM to receive the raw template with literal braces or "None".- instruction=REVIEW_INSTRUCTION, + instruction=REVIEW_INSTRUCTION.format(guidelines=guidelines or ""),
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (5)
src/draive/__init__.py(3 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_preparation.py(1 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)src/draive/instructions/__init__.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
src/draive/helpers/__init__.py (2)
src/draive/helpers/instruction_preparation.py (1)
prepare_instruction(16-52)src/draive/helpers/instruction_refinement.py (1)
refine_instruction(26-51)
src/draive/instructions/__init__.py (1)
src/draive/instructions/types.py (1)
InstructionDeclaration(42-46)
src/draive/__init__.py (3)
src/draive/helpers/instruction_preparation.py (1)
prepare_instruction(16-52)src/draive/helpers/instruction_refinement.py (1)
refine_instruction(26-51)src/draive/instructions/types.py (2)
Instruction(50-211)InstructionDeclaration(42-46)
🔇 Additional comments (5)
src/draive/helpers/__init__.py (1)
1-2: Public-API extension looks correctThe new helper functions are imported and appended to
__all__, making them discoverable viaimport draive.helpers as h. No functional or stylistic issues spotted here.Also applies to: 17-18
src/draive/instructions/__init__.py (1)
5-6:InstructionDeclarationcorrectly re-exportedAdding the type to both the import list and
__all__cleanly exposes it to external callers. Looks good.Also applies to: 14-15
src/draive/helpers/instruction_refinement.py (3)
260-264: Inconsistent guidelines interpolation between templates.The
REFINE_INSTRUCTIONtemplate correctly interpolates the{guidelines}placeholder, butREVIEW_INSTRUCTIONdoes not (as noted in the previous comment). This inconsistency could lead to confusion.The implementation here correctly formats the guidelines placeholder, which is the expected pattern.
25-33: Well-designed function signature using modern Python generics.The function signature effectively uses Python's newer generic syntax and provides clear type constraints for the evaluation suite parameters.
37-51: Excellent use of staged processing architecture.The sequence of initialization, evaluation, and refinement loop stages provides a clean, modular approach to instruction refinement with proper logging and state management.
091fdeb to
dcdf7cd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (5)
src/draive/helpers/instruction_refinement.py (5)
34-34: Use proper validation instead of assert for production code.The
assertstatement is disabled when Python runs with optimization flags (-O), making this validation ineffective in production environments.
217-224: Typo inEVALUATION_RESULTtag breaks downstream prompt.The tag is spelled
<EALUATION_RESULT>(missing "V"), whereasREVIEW_INSTRUCTIONexpectsEVALUATION_RESULT. This inconsistency may confuse the LLM and any XML parsing utilities.
227-231:{guidelines}placeholder inREVIEW_INSTRUCTIONis never interpolated.
Stage.completionreceives the raw template, leaving{guidelines}untouched. Without proper interpolation, the LLM sees literal braces or potentially confusing content.
292-292: Consider making the evaluation score threshold configurable.The hardcoded threshold of
0.99for stopping refinement could benefit from being configurable, as different use cases may require different quality thresholds.
314-314: Address the TODO comment for instruction refinement.The TODO comment indicates that the
REFINE_INSTRUCTIONtemplate needs refinement.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
src/draive/__init__.py(2 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/draive/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(26-51)
src/draive/helpers/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(26-51)
🔇 Additional comments (3)
src/draive/__init__.py (1)
109-109: LGTM! Clean export pattern followed.The addition of
refine_instructionto both the import list and__all__tuple correctly exposes the new functionality at the package level. The alphabetical placement is consistent with the existing codebase patterns.Also applies to: 386-386
src/draive/helpers/__init__.py (1)
5-5: LGTM! Proper module integration.The import and export of
refine_instructionfollows the established pattern and correctly integrates the new functionality into the helpers subpackage.Also applies to: 22-22
src/draive/helpers/instruction_refinement.py (1)
25-52: Well-structured asynchronous workflow implementation.The
refine_instructionfunction provides a clean, staged approach to instruction refinement with proper logging, state management, and error handling through the Stage.sequence pattern. The type annotations and parameter structure are well-designed.
dcdf7cd to
0053d3d
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
src/draive/evaluation/suite.py (1)
381-405: 🛠️ Refactor suggestion
EvaluationSuiteDatatype parameter should mirror the suite’s parameter classYou pass
EvaluationSuiteData[CaseParameters], which keeps the originalTypeVarrather than the concreteself._parameters.
At runtime this can lead toTypeError: 'typing.TypeVar' object is not subscriptablewhen the data class metaprogramming kicks in.- suite_storage = _EvaluationSuiteMemoryStorage[CaseParameters]( - data_type=EvaluationSuiteData[CaseParameters], + suite_storage = _EvaluationSuiteMemoryStorage[CaseParameters]( + data_type=EvaluationSuiteData[self._parameters],
♻️ Duplicate comments (5)
src/draive/helpers/instruction_refinement.py (5)
32-32:⚠️ Potential issueReplace
assertwith explicit runtime validation (previous feedback still applies)
assert rounds_limit > 0is not enforced when Python runs with the-Ooptimisation flag.
Please raise a proper exception instead.- assert rounds_limit > 0 # nosec: B101 + if rounds_limit <= 0: + raise ValueError("rounds_limit must be greater than 0")
225-229:⚠️ Potential issue
{guidelines}placeholder is never interpolated (still unresolved)
REVIEW_INSTRUCTIONcontains a{guidelines}token, but you are passing the raw template to the Stage, leaving the placeholder untouched.- instruction=REVIEW_INSTRUCTION, + instruction=REVIEW_INSTRUCTION.format(guidelines=guidelines or ""),
258-260:⚠️ Potential issue
guidelinesmay become literal"None"in promptWhen
guidelinesisNone,.format(guidelines=guidelines)inserts the string"None".
Use a default empty string instead.- instruction=REFINE_INSTRUCTION.format( - guidelines=guidelines, + instruction=REFINE_INSTRUCTION.format( + guidelines=guidelines or "",
289-293: 🛠️ Refactor suggestionHard-coded quality threshold reduces configurability
0.99is still hard-coded. Make it an argument (e.g.,quality_threshold: float = 0.99) so callers can adapt the refinement strictness.
215-222:⚠️ Potential issueTypo breaks downstream XML parsing –
<EALUATION_RESULT>→<EVALUATION_RESULT>The miss-spelled tag will confuse
MultimodalTagElement.parse_firstand any prompt logic that expectsEVALUATION_RESULT.- "\n<EALUATION_RESULT>", + "\n<EVALUATION_RESULT>", … - "</EALUATION_RESULT>", + "</EVALUATION_RESULT>",
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (11)
guides/BasicEvaluation.md(1 hunks)src/draive/__init__.py(2 hunks)src/draive/evaluation/suite.py(6 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_preparation.py(6 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)src/draive/instructions/file.py(1 hunks)src/draive/instructions/state.py(2 hunks)src/draive/instructions/template.py(1 hunks)src/draive/instructions/types.py(4 hunks)src/draive/instructions/volatile.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
src/draive/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-49)
src/draive/instructions/template.py (3)
src/draive/parameters/function.py (1)
arguments(202-203)src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)src/draive/commons/metadata.py (2)
name(100-106)description(121-127)
src/draive/helpers/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-49)
src/draive/helpers/instruction_preparation.py (2)
src/draive/parameters/function.py (1)
arguments(202-203)src/draive/instructions/template.py (3)
instruction(81-84)instruction(88-93)instruction(96-117)
src/draive/instructions/file.py (1)
src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)
src/draive/helpers/instruction_refinement.py (7)
src/draive/stages/stage.py (5)
Stage(137-1933)sequence(946-996)transform_result(603-640)loop(838-943)trim_context(685-753)src/draive/evaluation/suite.py (8)
EvaluationSuite(232-480)SuiteEvaluatorResult(99-142)of(147-176)relative_score(87-96)relative_score(133-142)cases(416-421)report(52-84)report(106-130)src/draive/instructions/types.py (5)
Instruction(46-216)of(88-97)of(101-110)of(113-150)format(167-183)src/draive/instructions/state.py (6)
Instructions(39-152)fetch(74-81)fetch(85-93)fetch(97-105)fetch(108-148)of(41-51)src/draive/lmm/types.py (2)
LMMCompletion(95-113)LMMInput(74-92)src/draive/multimodal/tags.py (1)
MultimodalTagElement(15-373)src/draive/utils/processing.py (1)
Processing(96-202)
🪛 LanguageTool
guides/BasicEvaluation.md
[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🔇 Additional comments (15)
src/draive/__init__.py (1)
109-109: LGTM! Consistent package-level export.The
refine_instructionfunction is properly imported and exported following the established alphabetical ordering convention in both the import statement and__all__tuple.Also applies to: 386-386
src/draive/helpers/__init__.py (1)
5-5:Details
✅ Verification successful
Verify the instruction_refinement module exists.
The import and export look correct and follow established patterns. However, please ensure that the
src/draive/helpers/instruction_refinement.pymodule exists and contains therefine_instructionfunction.Also applies to: 22-22
🏁 Script executed:
#!/bin/bash # Description: Verify that the instruction_refinement module exists and exports refine_instruction # Expected: File should exist and contain the refine_instruction function # Check if the module file exists if [[ -f "src/draive/helpers/instruction_refinement.py" ]]; then echo "✓ Module file exists" # Check if refine_instruction function is defined if grep -q "^def refine_instruction\|^async def refine_instruction" "src/draive/helpers/instruction_refinement.py"; then echo "✓ refine_instruction function found" else echo "✗ refine_instruction function not found" fi else echo "✗ Module file not found" fiLength of output: 299
Confirmed: instruction_refinement Module Exists
The file
src/draive/helpers/instruction_refinement.pyis present and defines therefine_instructionfunction as expected. No further action needed.src/draive/instructions/template.py (1)
44-44: Align with updated InstructionDeclarationArgument data model.The change from
specification=parameter.specificationtodescription=parameter.descriptioncorrectly aligns with the updatedInstructionDeclarationArgumentclass structure that now uses adescriptionfield instead ofspecification. TheParameter.descriptionproperty returns the appropriate type (str | None).Note: This is a potential breaking change if external code depends on the
specificationfield ofInstructionDeclarationArgument.src/draive/instructions/state.py (2)
40-51: Well-designed factory method for in-memory instructions.The
ofclass method provides a clean way to createInstructionsinstances from in-memory instruction objects. The implementation correctly:
- Uses appropriate parameter unpacking for variadic instructions
- Creates
InstructionsVolatileStoragewith the instruction tuple- Returns
Instructionsinstance with proper method assignments- Maintains consistency with the existing
fileclass method pattern
16-16:Details
✅ Verification successful
Verify InstructionsVolatileStorage module exists.
The import looks correct and follows the established pattern.
🏁 Script executed:
#!/bin/bash # Description: Verify that the volatile module exists and exports InstructionsVolatileStorage # Expected: File should exist and contain the InstructionsVolatileStorage class # Check if the module file exists if [[ -f "src/draive/instructions/volatile.py" ]]; then echo "✓ Volatile module file exists" # Check if InstructionsVolatileStorage class is defined if grep -q "^class InstructionsVolatileStorage" "src/draive/instructions/volatile.py"; then echo "✓ InstructionsVolatileStorage class found" else echo "✗ InstructionsVolatileStorage class not found" fi else echo "✗ Volatile module file not found" fiLength of output: 291
✅ InstructionsVolatileStorage module exists — changes approved
- Verified
src/draive/instructions/volatile.pyis present.- Confirmed the
InstructionsVolatileStorageclass is defined.No further action required.
src/draive/instructions/volatile.py (1)
13-24: LGTM! Clean in-memory storage implementation.The class design is well-structured with pre-computed declarations for efficient listing operations and proper error handling for missing instructions.
src/draive/instructions/file.py (1)
52-52:Details
✅ Verification successful
Verify compatibility with simplified argument construction.
The change from using specification to only using argument name aligns with the broader refactoring in the codebase. However, ensure that removing the specification information doesn't break existing functionality that might rely on richer argument metadata.
Let me check if this change is consistent across the codebase:
🏁 Script executed:
#!/bin/bash # Description: Check for any remaining usage of specification field in InstructionDeclarationArgument # Expected: No remaining references to specification field # Search for specification field usage rg -A 5 -B 5 "specification.*InstructionDeclarationArgument|InstructionDeclarationArgument.*specification" # Search for any InstructionDeclarationArgument constructor calls with specification rg -A 2 "InstructionDeclarationArgument.*specification"Length of output: 166
Let's inspect the
InstructionDeclarationArgumentclass itself to confirm its constructor signature:
🏁 Script executed:
#!/bin/bash # List all files referencing InstructionDeclarationArgument rg -l "InstructionDeclarationArgument" # Show the class definition and its __init__ signature rg -A 10 -B 0 "class InstructionDeclarationArgument"Length of output: 836
InstructionDeclarationArgument Simplification Verified
No remaining references to the old
specificationfield were found in the codebase. Insrc/draive/instructions/types.py, theInstructionDeclarationArgumentconstructor now defines:name: str description: str | None = None required: bool = Truewhich means callers using only
namecontinue to receive the defaultdescription=Noneandrequired=True. This simplification is fully backward-compatible and requires no further changes.src/draive/helpers/instruction_preparation.py (4)
9-12: Good practice: Using tuple for all.Converting from list to tuple is a good practice since
__all__should be immutable.
48-48: Improved XML structure by moving variables outside USER_TASK.The change to move the formatted variables outside the
<USER_TASK>tag improves the XML structure and clarity.
91-91: Improved clarity by removing ambiguous phrase.Removing "or not achievable" from the instruction clarifies when to use "Task not achievable" response.
105-107: Enhanced variable usage instructions.The clarification about variable resolution and part title naming improves the instruction preparation guidance.
src/draive/instructions/types.py (3)
33-33: Simplified argument model with description field.Replacing the complex
specificationfield with a simpledescriptionstring simplifies the argument model. This is a good simplification that aligns with the broader refactoring.
158-165: Dynamic declaration generation via property.The
declarationproperty provides a clean way to generate instruction declarations dynamically. The implementation correctly extracts variables from the content string and maintains consistency with the instruction's metadata.
219-227: Robust variable extraction using Python's Formatter.The
extract_variablesfunction uses Python's built-inFormatterto parse format strings, which is the correct approach for extracting variable names from format strings. The implementation properly filters out empty field names.However, consider adding a docstring to document the function's purpose and behavior:
def extract_variables( string: str, /, ) -> Sequence[InstructionDeclarationArgument]: + """Extract variable names from a format string and return as InstructionDeclarationArgument objects. + + Args: + string: Format string containing variable placeholders like {variable_name} + + Returns: + Sequence of InstructionDeclarationArgument objects for each variable found + """ return [ InstructionDeclarationArgument(name=field_name) for _, field_name, _, _ in Formatter().parse(string) if field_name ]src/draive/helpers/instruction_refinement.py (1)
24-31: Verify experimental generic-function syntaxThe
def refine_instruction[CaseParameters: ...]notation relies on PEP 695, available only on Python 3.12+.
Confirm your runtime/tooling supports this; otherwise revert toTypeVar-based generics.
0053d3d to
f241080
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (5)
src/draive/instructions/volatile.py (1)
32-43: Consider documenting unused arguments parameter.The
argumentsparameter is accepted but not used in the method implementation. This might be intentional for protocol compliance, but consider adding a docstring or comment to clarify this behavior.src/draive/evaluation/suite.py (1)
400-404: Variable shadowing inside list-comprehension still hurts readabilityThe inner loop variable
casere-uses the pluralised outer variable namecases, reviving the exact readability issue raised in a previous review.
Please rename the comprehension variable to something unique (params,single_case, …) so the intent is instantly clear.- cases=[EvaluationSuiteCase(parameters=case) for case in cases], + cases=[EvaluationSuiteCase(parameters=params) for params in cases],Also applies to: 517-520
src/draive/helpers/instruction_refinement.py (2)
33-35: Replaceassertwith explicit argument validation
assertstatements are skipped when Python runs with the-Oflag, so the critical checks onrounds_limitandquality_thresholdmay silently disappear in production.- assert rounds_limit > 0 # nosec: B101 - assert 1 >= quality_threshold >= 0 # nosec: B101 + if rounds_limit <= 0: + raise ValueError("`rounds_limit` must be greater than 0") + if not (0 <= quality_threshold <= 1): + raise ValueError("`quality_threshold` must be between 0 and 1")
223-229: Typo in<EVALUATION_RESULT>tag breaks downstream prompt parsingThe tag opening string is missing the “V” (
<EALUATION_RESULT>).
REVIEW_INSTRUCTIONexpects the correctly-spelled tag, so the LLM receives a malformed prompt and any XML parsing utilities will fail.- "\n<EALUATION_RESULT>", + "\n<EVALUATION_RESULT>", … - "</EALUATION_RESULT>", + "</EVALUATION_RESULT>",guides/BasicEvaluation.md (1)
1-4: 🧹 Nitpick (assertive)Add missing definite article for smoother reading
-Draive framework provides comprehensive evaluation capabilities +The Draive framework provides comprehensive evaluation capabilities🧰 Tools
🪛 LanguageTool
[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (11)
guides/BasicEvaluation.md(1 hunks)src/draive/__init__.py(2 hunks)src/draive/evaluation/suite.py(6 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_preparation.py(6 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)src/draive/instructions/file.py(1 hunks)src/draive/instructions/state.py(2 hunks)src/draive/instructions/template.py(1 hunks)src/draive/instructions/types.py(4 hunks)src/draive/instructions/volatile.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
src/draive/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-52)
src/draive/helpers/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-52)
src/draive/instructions/template.py (3)
src/draive/parameters/function.py (1)
arguments(202-203)src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)src/draive/commons/metadata.py (2)
name(100-106)description(121-127)
src/draive/instructions/types.py (2)
src/draive/commons/metadata.py (3)
Meta(20-275)description(121-127)name(100-106)src/draive/parameters/model.py (1)
DataModel(386-738)
src/draive/instructions/state.py (2)
src/draive/instructions/volatile.py (3)
InstructionsVolatileStorage(14-43)instruction(32-43)listing(26-30)src/draive/instructions/types.py (4)
of(88-97)of(101-110)of(113-150)Instruction(46-216)
src/draive/instructions/file.py (1)
src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)
🪛 LanguageTool
guides/BasicEvaluation.md
[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🔇 Additional comments (13)
src/draive/__init__.py (1)
109-109: LGTM! Proper exposure of new functionality.The addition of
refine_instructionto both the import statement and__all__tuple correctly makes the new instruction refinement functionality available at the package level.Also applies to: 386-386
src/draive/helpers/__init__.py (1)
5-5: LGTM! Consistent package structure.The import and export of
refine_instructioncorrectly follows the package structure pattern and makes the new functionality available through the helpers module.Also applies to: 22-22
src/draive/instructions/state.py (1)
16-16: LGTM! Proper import addition.The import of
InstructionsVolatileStoragesupports the new functionality correctly.src/draive/instructions/volatile.py (1)
13-44: LGTM! Clean volatile storage implementation.The
InstructionsVolatileStorageclass provides a clean in-memory storage mechanism for instructions. The implementation correctly stores instructions by name and maintains declarations for listing.src/draive/instructions/file.py (1)
52-52: LGTM! Simplification aligns with argument declaration refactoring.The change from using explicit specification to just the argument name is consistent with the broader simplification of
InstructionDeclarationArgumentin the types module.src/draive/helpers/instruction_preparation.py (5)
9-12: LGTM! Tuple is preferred for immutable exports.Using a tuple for
__all__is appropriate since the exported symbols shouldn't change at runtime.
48-48: Approve XML structure improvement.Moving the variables outside the
USER_TASKtag creates cleaner separation between the task description and variables.
75-86: LGTM! Improved variable formatting with better readability.The change from XML with JSON specifications to bullet-point format with descriptions is more readable and aligns with the simplified argument model. The fallback for missing descriptions handles the case where
argument.descriptionmight beNone.
96-96: LGTM! Clearer task achievability guidance.Removing "or not achievable" makes the instruction clearer about when to use "Task not achievable."
114-150: LGTM! Better structural organization with PARTS.The rename from
SECTIONStoPARTSand the reordering of instructions provides clearer guidance for instruction preparation.src/draive/instructions/types.py (3)
31-34: LGTM! Simplified argument model improves usability.Replacing the complex
specificationfield with a simple optionaldescriptionstring makes the argument model much more straightforward and user-friendly.
158-165: LGTM! Dynamic declaration extraction is well-implemented.The
declarationproperty provides a clean way to extract instruction declarations dynamically. The use ofextract_variablesensures consistency with the instruction content.
219-227: LGTM! Robust variable extraction using standard library.The
extract_variablesfunction correctly usesstring.Formatterto parse format field names from instruction content. The list comprehension properly filters out empty field names and creates appropriate argument declarations.
f241080 to
e4a92bb
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🔭 Outside diff range comments (1)
src/draive/helpers/instruction_preparation.py (1)
48-52:⚠️ Potential issue
guidelineswith curly braces will crash.formatUser-supplied
guidelinesthat contain{or}raiseKeyErrorduring
PREPARE_INSTRUCTION.format(...). Escape braces or switch to
str.replace.- instruction=PREPARE_INSTRUCTION.format( - guidelines=... - ), + instruction=PREPARE_INSTRUCTION.replace( + "{guidelines}", + (f"\n<GUIDELINES>{guidelines}</GUIDELINES>\n" if guidelines else "").replace("{", "{{").replace("}", "}}"), + ),
♻️ Duplicate comments (9)
src/draive/instructions/template.py (1)
41-47: TODO comment about string verification remains unaddressed.The comment questioning whether all parameters should be verified as strings was flagged in a previous review but hasn't been resolved. This should be addressed to clarify the validation requirements.
src/draive/instructions/state.py (1)
40-51: Good factory method, but consider validation for duplicate names.The
ofclass method provides a convenient way to createInstructionsfrom volatile storage. However, consider validating for duplicate instruction names to prevent silent overwrites.src/draive/instructions/volatile.py (1)
32-43: Consider documenting unused arguments parameter.The
argumentsparameter is accepted but not used in the method implementation. This might be intentional for protocol compliance, but consider adding a docstring or comment to clarify this behavior.src/draive/helpers/instruction_preparation.py (2)
25-25: Python ≤ 3.11 cannot parse generic-function syntax
This was flagged previously and is still present.-async def prepare_instruction[CaseParameters: DataModel, Result: DataModel | str]( +async def prepare_instruction(
75-83: Avoid “None” in variable listWhen
argument.descriptionisNonethe bullet rendersNone, which is
misleading.- f"- {argument.name}: {argument.description}" + f"- {argument.name}: {argument.description or ''}".rstrip(': ')src/draive/helpers/instruction_refinement.py (3)
33-35: Use explicit runtime checks instead ofassert
assertstatements are skipped withpython -O, disabling validation in
production.- assert rounds_limit > 0 # nosec: B101 - assert 1 >= quality_threshold >= 0 # nosec: B101 + if rounds_limit <= 0: + raise ValueError("rounds_limit must be greater than 0") + if not 0 <= quality_threshold <= 1: + raise ValueError("quality_threshold must be between 0 and 1")
223-229: Typo in XML tag – missing “V” in<EVALUATION_RESULT>The analyser produces
<EALUATION_RESULT>which does not match the tag
documented inREVIEW_INSTRUCTIONand will not be parsed.- "\n<EALUATION_RESULT>", + "\n<EVALUATION_RESULT>", ... - "</EALUATION_RESULT>", + "</EVALUATION_RESULT>",
265-267: Brace-injection risk forguidelinesplaceholderSame issue as in
instruction_preparation.py: braces insideguidelines
will either format-explode or leak into the LLM prompt.Apply the brace-escaping approach suggested earlier.
guides/BasicEvaluation.md (1)
3-3: Grammar correction neededThe sentence is missing the definite article "the" before "Draive framework".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (14)
guides/BasicEvaluation.md(1 hunks)pyproject.toml(1 hunks)src/draive/__init__.py(2 hunks)src/draive/evaluation/evaluator.py(1 hunks)src/draive/evaluation/scenario.py(2 hunks)src/draive/evaluation/suite.py(11 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_preparation.py(8 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)src/draive/instructions/file.py(1 hunks)src/draive/instructions/state.py(2 hunks)src/draive/instructions/template.py(1 hunks)src/draive/instructions/types.py(4 hunks)src/draive/instructions/volatile.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
src/draive/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-52)
src/draive/instructions/template.py (3)
src/draive/parameters/function.py (1)
arguments(202-203)src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)src/draive/commons/metadata.py (2)
name(100-106)description(121-127)
src/draive/helpers/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-52)
src/draive/instructions/file.py (1)
src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)
src/draive/evaluation/scenario.py (2)
src/draive/evaluation/evaluator.py (3)
report(70-91)passed(67-68)evaluation(379-389)src/draive/evaluation/suite.py (6)
report(52-88)report(109-133)passed(48-50)passed(106-107)relative_score(91-99)relative_score(136-140)
src/draive/helpers/instruction_preparation.py (4)
src/draive/parameters/function.py (1)
arguments(202-203)src/draive/instructions/file.py (1)
instruction(63-82)src/draive/instructions/template.py (3)
instruction(81-84)instruction(88-93)instruction(96-117)src/draive/instructions/volatile.py (1)
instruction(32-43)
🪛 LanguageTool
guides/BasicEvaluation.md
[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🔇 Additional comments (16)
pyproject.toml (1)
8-8: Version bump looks appropriate for the new functionality.The minor version increment from "0.66.5" to "0.67.0" correctly reflects the addition of new public API functionality, including the
refine_instructionfeature.src/draive/__init__.py (2)
109-109: LGTM! Import follows established pattern.The import of
refine_instructionfromdraive.helpersis correctly placed and formatted.
386-386: LGTM! Public export properly added.Adding
refine_instructionto the__all__tuple correctly exposes the new functionality in the public API.src/draive/helpers/__init__.py (2)
5-5: LGTM! Import correctly references the new module.The import statement properly brings in
refine_instructionfrom the newly createdinstruction_refinementmodule.
22-22: LGTM! Public export properly configured.Adding
refine_instructionto the__all__tuple correctly exposes it as part of the helpers package public API.src/draive/instructions/template.py (1)
44-44: LGTM! Parameter attribute change aligns with API refactoring.The change from
specification=parameter.specificationtodescription=parameter.descriptioncorrectly aligns with the updatedInstructionDeclarationArgumenttype structure. The types are compatible (str | None).src/draive/instructions/volatile.py (1)
13-24: Well-designed volatile storage implementation.The
InstructionsVolatileStorageclass provides a clean in-memory storage solution for instructions. The dictionary-based storage by name enables efficient lookups, and the pre-computed declarations list supports fast listing operations.src/draive/instructions/file.py (1)
52-52: Good simplification of argument declaration.The removal of the
specificationparameter aligns with the simplifiedInstructionDeclarationArgumentstructure that now uses adescriptionfield instead of the complexspecificationdictionary. This change improves consistency across the instruction system.src/draive/evaluation/evaluator.py (1)
70-91: Excellent refactoring of the report method.The changes significantly improve the output format:
- Centralized status logic with the
statusvariable enhances maintainability- XML-like structured format for detailed reports provides better readability
- Improved comment handling with quotes and "N/A" fallback
- Simplified non-detailed format reduces redundancy
The XML structure aligns well with the instruction refinement workflow mentioned in the AI summary.
guides/BasicEvaluation.md (2)
9-33: Well-designed evaluator exampleThe
keyword_evaluatorimplementation demonstrates good practices with proper error handling for empty keyword lists and clear scoring logic. The use of the@evaluatordecorator with name and threshold parameters is correctly shown.
143-199: Excellent evaluation suite exampleThe evaluation suite example effectively demonstrates the complete workflow from defining test cases to executing and reporting results. The integration with the new sequence-based storage capability (shown in line 186) aligns well with the framework enhancements.
src/draive/evaluation/suite.py (5)
72-79: Improved structured reporting formatThe XML-like structured output provides better organization and machine-readable format. The TODO comment about converting DataModel to XML representation indicates future enhancement potential.
95-99: Enhanced relative score calculationThe change from counting passed results to averaging relative scores provides a more nuanced scoring mechanism that better reflects the quality of evaluations rather than just pass/fail status.
381-402: Valuable sequence storage supportThe addition of
Sequence[CaseParameters]support enables direct initialization with case parameter lists, making the API more convenient for in-memory testing scenarios. The pattern matching correctly handles the sequence case and creates appropriate storage.
484-518: Consistent sequence support in factory functionThe
evaluation_suitefactory function correctly mirrors the sequence support added to the class method, maintaining API consistency across different initialization approaches.
540-542: Clean constructor enhancementThe updated constructor properly handles the optional cases parameter with a sensible default of empty tuple, maintaining backward compatibility while enabling the new functionality.
e4a92bb to
c1b5720
Compare
c1b5720 to
63c4634
Compare
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (1)
src/draive/helpers/instruction_preparation.py (1)
25-31:⚠️ Potential issuePEP-695 generic header breaks on Python ≤ 3.11
def prepare_instruction[CaseParameters: DataModel, …]only parses on 3.12+.
Unless the project already requires 3.12, drop the inline generics or move them toTypeVars:-async def prepare_instruction[CaseParameters: DataModel, Result: DataModel | str]( +CaseParametersT = TypeVar("CaseParametersT", bound=DataModel) +ResultT = TypeVar("ResultT", bound=DataModel | str) + +async def prepare_instruction(Failing to do so prevents the module from importing on most current interpreters.
♻️ Duplicate comments (6)
src/draive/instructions/template.py (1)
41-47: Address the unresolved TODO comment about string verification.The comment questioning string verification remains unaddressed from previous reviews. Consider adding validation logic to ensure parameter values are strings or updating type annotations if all arguments should indeed be strings.
src/draive/instructions/state.py (1)
40-51: Address the duplicate name validation concern.The implementation looks good for creating Instructions from volatile storage. However, the previous review comment about validating duplicate instruction names remains unaddressed and is still applicable.
src/draive/instructions/volatile.py (1)
32-43: Document the unused arguments parameter.The previous review comment about documenting the unused
argumentsparameter for protocol compliance remains applicable and unaddressed.src/draive/helpers/instruction_preparation.py (1)
48-51: 🛠️ Refactor suggestion
str.formatbrace-injection – crashes on{/}inside guidelines
PREPARE_INSTRUCTION.format(guidelines=...)interprets any curly braces contained insidePREPARE_INSTRUCTION.
If theguidelinestext itself carries braces the formatting fails withKeyErroror worse.Patch:
- instruction=PREPARE_INSTRUCTION.format( - guidelines=f"\n<GUIDELINES>{guidelines}</GUIDELINES>\n" if guidelines else "", - ), + instruction=PREPARE_INSTRUCTION.replace( + "{guidelines}", + ( + f"\n<GUIDELINES>{guidelines}</GUIDELINES>\n".replace("{", "{{").replace("}", "}}") + if guidelines + else "" + ), + ),…and change the literal
{guidelines}placeholder in the template to stay compatible.src/draive/helpers/instruction_refinement.py (1)
24-32:⚠️ Potential issueGeneric function header & asserts replicate earlier portability issues
- The square-bracket generic header is still present – see earlier comment in
instruction_preparation.py.- Using
assertfor user input validation (rounds_limit,quality_threshold) is unsafe because it disappears underpython -O.-async def refine_instruction[CaseParameters: DataModel, Result: DataModel | str]( +async def refine_instruction( ... - assert rounds_limit > 0 # nosec: B101 - assert 1 >= quality_threshold >= 0 # nosec: B101 + if rounds_limit <= 0: + raise ValueError("rounds_limit must be > 0") + if not 0 <= quality_threshold <= 1: + raise ValueError("quality_threshold must be within [0, 1]")Please apply the same fix or raise the Python requirement.
guides/BasicEvaluation.md (1)
1-3: Minor grammar tweakAdd the definite article for smoother reading.
-# Basic Evaluation Usage - -Draive framework provides comprehensive evaluation capabilities +# Basic Evaluation Usage + +The Draive framework provides comprehensive evaluation capabilities🧰 Tools
🪛 LanguageTool
[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...(AI_EN_LECTOR_MISSING_DETERMINER_THE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (14)
guides/BasicEvaluation.md(1 hunks)pyproject.toml(1 hunks)src/draive/__init__.py(2 hunks)src/draive/evaluation/evaluator.py(1 hunks)src/draive/evaluation/scenario.py(2 hunks)src/draive/evaluation/suite.py(11 hunks)src/draive/helpers/__init__.py(2 hunks)src/draive/helpers/instruction_preparation.py(8 hunks)src/draive/helpers/instruction_refinement.py(1 hunks)src/draive/instructions/file.py(1 hunks)src/draive/instructions/state.py(2 hunks)src/draive/instructions/template.py(1 hunks)src/draive/instructions/types.py(4 hunks)src/draive/instructions/volatile.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (6)
src/draive/helpers/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-52)
src/draive/__init__.py (1)
src/draive/helpers/instruction_refinement.py (1)
refine_instruction(24-52)
src/draive/instructions/file.py (1)
src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)
src/draive/instructions/template.py (3)
src/draive/parameters/function.py (1)
arguments(202-203)src/draive/instructions/types.py (1)
InstructionDeclarationArgument(31-34)src/draive/commons/metadata.py (2)
name(100-106)description(121-127)
src/draive/evaluation/evaluator.py (2)
src/draive/evaluation/scenario.py (2)
passed(33-35)relative_score(68-74)src/draive/evaluation/suite.py (4)
passed(48-50)passed(106-107)relative_score(91-99)relative_score(136-144)
src/draive/evaluation/scenario.py (2)
src/draive/evaluation/evaluator.py (4)
report(70-92)passed(67-68)relative_score(95-99)evaluation(387-397)src/draive/evaluation/suite.py (6)
report(52-88)report(109-133)passed(48-50)passed(106-107)relative_score(91-99)relative_score(136-144)
🪛 LanguageTool
guides/BasicEvaluation.md
[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🔇 Additional comments (15)
pyproject.toml (1)
8-8: Version bump looks appropriate.The minor version increment from "0.66.5" to "0.67.0" correctly follows semantic versioning for the addition of new instruction refinement capabilities without breaking changes.
src/draive/__init__.py (1)
109-109: Export additions follow established patterns correctly.The import and export of
refine_instructionmaintain consistency with the existing codebase structure and preserve alphabetical ordering in the__all__tuple.Also applies to: 386-386
src/draive/helpers/__init__.py (1)
5-5: Helper module export structure is correct.The import from
instruction_refinementmodule and addition to__all__properly exposes the new function through the helpers package, maintaining consistent module organization.Also applies to: 22-22
src/draive/instructions/template.py (1)
44-44: Parameter structure change aligns with type definition.The change from
specification=parameter.specificationtodescription=parameter.descriptioncorrectly matches the updatedInstructionDeclarationArgumentstructure shown in the relevant code snippets.src/draive/instructions/volatile.py (1)
13-25: LGTM! Clean implementation of volatile instruction storage.The class design is clean with efficient dictionary-based lookup and proper initialization of both storage and declaration listings. The use of dictionary comprehension and list comprehension for initialization is appropriate.
src/draive/instructions/file.py (1)
52-52: LGTM! Simplification aligns with class refactoring.The removal of the
specificationparameter correctly aligns with theInstructionDeclarationArgumentclass simplification, where the complexspecificationfield was replaced with a simplerdescriptionfield.src/draive/evaluation/scenario.py (3)
43-47: LGTM! Variable renaming improves clarity.Renaming
reporttoevaluations_reporteliminates potential confusion with the method name and makes the variable's purpose clearer.
51-56: LGTM! XML format improves consistency.The XML-like structured format aligns well with the reporting patterns used in other evaluation classes (as seen in
src/draive/evaluation/evaluator.pyandsrc/draive/evaluation/suite.py). The:.2fformatting also addresses the previous concern about floating-point precision in percentage display.
72-74: LGTM! Simplified calculation is more readable.The list comprehension approach is more concise and readable than the explicit loop, while maintaining the same logic for calculating the ratio of passed evaluations.
src/draive/instructions/types.py (1)
158-166: Nice addition – self-describing declaration propertyExposing
instruction.declarationgreatly simplifies discovery of variables and metadata. The implementation is clean and keeps argument order intact – well done!src/draive/helpers/instruction_preparation.py (1)
71-87: Avoid “None” appearing in the variables listThe new bullet-list format is clearer, but when
argument.description is Nonewe now render
- var_name: None. Consider omitting the colon to keep the output tidy:- f"- {argument.name}: {argument.description}" + f"- {argument.name}: {argument.description}" if argument.description - else f"- {argument.name}" + else f"- {argument.name}"Current code already does this – great!
src/draive/evaluation/suite.py (4)
385-385: LGTM! Enhanced storage flexibilityThe addition of
Sequence[CaseParameters]support inwith_storagemethod provides a convenient way to initialize evaluation suites with in-memory test cases. The implementation correctly converts the sequence intoEvaluationSuiteCaseobjects and uses appropriate pattern matching.Also applies to: 402-406
58-58: LGTM! Improved reporting formatThe XML-like structure for detailed reports provides better structured output and includes valuable information like relative scores and evaluated values. The format is consistent and well-organized.
Also applies to: 72-79, 82-82
95-99: LGTM! More accurate relative score calculationThe updated relative score calculation using averages instead of simple pass/fail counts provides a more nuanced and accurate representation of evaluation quality across multiple criteria.
Also applies to: 140-144
488-492: LGTM! Consistent sequence support in factory functionThe
evaluation_suitefactory function now consistently supports sequence inputs, matching the functionality added to thewith_storagemethod. The implementation maintains type safety and proper initialization.Also applies to: 518-522
| def extract_variables( | ||
| string: str, | ||
| /, | ||
| ) -> Sequence[InstructionDeclarationArgument]: | ||
| seen: set[str] = set() | ||
| unique_args: list[InstructionDeclarationArgument] = [] | ||
| for _, field_name, _, _ in Formatter().parse(string): | ||
| if field_name and field_name not in seen: | ||
| seen.add(field_name) | ||
| unique_args.append(InstructionDeclarationArgument(name=field_name)) | ||
|
|
||
| return unique_args | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Minor: retain ordering without a set look-up penalty
extract_variables correctly de-duplicates but the set look-ups in the hot loop could be avoided by using an OrderedDict.fromkeys(...). Given typical prompt sizes the impact is negligible, so feel free to ignore.
🤖 Prompt for AI Agents
In src/draive/instructions/types.py around lines 219 to 231, the
extract_variables function uses a set to track seen field names for
de-duplication, which causes a set look-up in the loop. To avoid this minor
performance penalty and retain ordering, replace the set and list approach with
an OrderedDict.fromkeys(...) on the parsed field names to get unique keys in
order, then convert those keys into InstructionDeclarationArgument instances for
the return list.
No description provided.