Prepare instruction optimization method by KaQuMiQ · Pull Request #216 · miquido/draive

KaQuMiQ · 2025-01-16T15:20:09Z

No description provided.

coderabbitai · 2025-05-19T15:25:04Z

Walkthrough

The changes introduce a new asynchronous instruction refinement workflow in the src/draive/helpers/instruction_refinement.py module, centered around the refine_instruction coroutine. This function orchestrates iterative improvement of an instruction through evaluation, analysis, and refinement stages, utilizing XML-based prompts and a custom state class. The refine_instruction symbol is made publicly accessible by adding it to the __all__ export lists in both src/draive/helpers/__init__.py and src/draive/__init__.py, allowing it to be imported directly from the draive and draive.helpers packages. Additionally, enhancements were made to evaluation suite storage handling to accept sequences of case parameters, and several modifications were applied across instruction-related modules to simplify argument declarations, add new volatile instruction storage, and provide new class methods for instruction state creation. A new comprehensive guide on evaluation usage was added to the documentation. No other logic or control flow changes are introduced.

Possibly related PRs

Prepare instruction writing helper #327: Adds the prepare_instruction function and related entities to the draive and draive.helpers packages, modifying exports similarly to this PR but for a different instruction processing workflow.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🧹 Nitpick comments (4)

src/draive/__init__.py (1)

120-122: Minor ordering nit – keep large __all__ alphabetised

To keep this huge symbol list maintainable, please insert the new names in alphabetical order (currently prepare_instruction/refine_instruction sit between unrelated blocks).

Also applies to: 125-126, 438-441

src/draive/helpers/instruction_preparation.py (1)

75-79: Micro-optimisation: build strings with join

For many arguments, repeated += string concatenation is quadratic. Collect parts in a list then ''.join(parts).
src/draive/helpers/instruction_refinement.py (2)
147-150: Equal evaluation scores are treated as regressions

current_evaluation.relative_score <= previous_evaluation.relative_score rolls back even when the score is unchanged, which can stall the loop if the suite is deterministic.
-        if current_evaluation.relative_score <= previous_evaluation.relative_score:
+        if current_evaluation.relative_score < previous_evaluation.relative_score:
Optionally add an ε-tolerance for floating-point noise.

268-271: Off-by-one risk when enforcing rounds_limit

The round counter is checked before it’s incremented, so a limit of 3 actually permits 4 iterations (0-3).
Increment first or change the predicate to > to run exactly rounds_limit refinements.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Knowledge Base: Disabled due to Reviews > Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 1d1324b and f7420c2.

📒 Files selected for processing (5)

src/draive/__init__.py (3 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_preparation.py (1 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)
src/draive/instructions/__init__.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (1)

src/draive/instructions/__init__.py (1)

src/draive/instructions/types.py (1)

InstructionDeclaration (42-46)

🔇 Additional comments (3)

src/draive/instructions/__init__.py (1)

5-6: Public export looks good

InstructionDeclaration is correctly imported and re-exported – nothing else to flag.

Also applies to: 14-15

src/draive/helpers/__init__.py (1)

1-2: Helper functions exposed correctly

prepare_instruction / refine_instruction are now part of the public API and added to __all__; imports resolve.

Also applies to: 15-16

src/draive/helpers/instruction_refinement.py (1)

25-33: Ensure runtime environment supports PEP 695 function-level type parameters

The generic syntax def refine_instruction[CaseParameters: DataModel, Result: DataModel | str](...) depends on PEP 695, available only from Python 3.12 onward. Importing this module on 3.10/3.11 will raise a SyntaxError.

• Verify that the project’s advertised minimum Python version is ≥ 3.12, and enforce it in packaging metadata / CI.
• Otherwise, fall back to TypeVar/ParamSpec in typing_extensions for broader compatibility.

src/draive/helpers/instruction_preparation.py

src/draive/helpers/instruction_refinement.py

coderabbitai

Actionable comments posted: 6

🔭 Outside diff range comments (1)

src/draive/__init__.py (1)

101-109: 🧹 Nitpick (assertive)

Mind potential duplication & ordering drift in the giant __all__

The three new symbols are registered correctly, but as the list grows (~180 items) accidental duplicates or maintenance errors become likely.
Consider generating __all__ programmatically (e.g. from importlib import import_module; __all__ = tuple(sorted(globals()))) or grouping by sub-package to reduce manual editing.

Also applies to: 267-268, 382-385

♻️ Duplicate comments (3)

src/draive/helpers/instruction_preparation.py (1)
16-21: Generic-function syntax still breaks ≤ Python 3.12

The square-bracket generic parameters (prepare_instruction[CaseParameters: …]) are only valid from 3.12+. Projects running 3.11 or lower will fail to import this module.

If 3.12 is not a hard requirement, drop the brackets:
-async def prepare_instruction[CaseParameters: DataModel, Result: DataModel | str](
+async def prepare_instruction(
src/draive/helpers/instruction_refinement.py (2)
218-218: Typo in XML tag breaks downstream processing.

The tag is misspelled as <EALUATION_RESULT> (missing "V") instead of <EVALUATION_RESULT>, which creates inconsistency and may confuse XML parsing utilities and the LLM.
-            "\n<EALUATION_RESULT>",
+            "\n<EVALUATION_RESULT>",
-            "</EALUATION_RESULT>",
+            "</EVALUATION_RESULT>",
Also applies to: 223-223

227-231: Guidelines placeholder is not interpolated in REVIEW_INSTRUCTION.

The {guidelines} placeholder in REVIEW_INSTRUCTION template is never replaced, causing the LLM to receive the raw template with literal braces or "None".
-                instruction=REVIEW_INSTRUCTION,
+                instruction=REVIEW_INSTRUCTION.format(guidelines=guidelines or ""),

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7420c2 and 091fdeb.

📒 Files selected for processing (5)

src/draive/__init__.py (3 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_preparation.py (1 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)
src/draive/instructions/__init__.py (2 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (3)

src/draive/helpers/__init__.py (2)

src/draive/helpers/instruction_preparation.py (1)

prepare_instruction (16-52)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (26-51)

src/draive/instructions/__init__.py (1)

src/draive/instructions/types.py (1)

InstructionDeclaration (42-46)

src/draive/__init__.py (3)

src/draive/helpers/instruction_preparation.py (1)

prepare_instruction (16-52)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (26-51)

src/draive/instructions/types.py (2)

Instruction (50-211)

InstructionDeclaration (42-46)

🔇 Additional comments (5)

src/draive/helpers/__init__.py (1)

1-2: Public-API extension looks correct

The new helper functions are imported and appended to __all__, making them discoverable via import draive.helpers as h. No functional or stylistic issues spotted here.

Also applies to: 17-18

src/draive/instructions/__init__.py (1)

5-6: InstructionDeclaration correctly re-exported

Adding the type to both the import list and __all__ cleanly exposes it to external callers. Looks good.

Also applies to: 14-15

src/draive/helpers/instruction_refinement.py (3)

260-264: Inconsistent guidelines interpolation between templates.

The REFINE_INSTRUCTION template correctly interpolates the {guidelines} placeholder, but REVIEW_INSTRUCTION does not (as noted in the previous comment). This inconsistency could lead to confusion.

The implementation here correctly formats the guidelines placeholder, which is the expected pattern.

25-33: Well-designed function signature using modern Python generics.

The function signature effectively uses Python's newer generic syntax and provides clear type constraints for the evaluation suite parameters.

37-51: Excellent use of staged processing architecture.

The sequence of initialization, evaluation, and refinement loop stages provides a clean, modular approach to instruction refinement with proper logging and state management.

src/draive/helpers/instruction_preparation.py

src/draive/helpers/instruction_refinement.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (5)

src/draive/helpers/instruction_refinement.py (5)

34-34: Use proper validation instead of assert for production code.

The assert statement is disabled when Python runs with optimization flags (-O), making this validation ineffective in production environments.

217-224: Typo in EVALUATION_RESULT tag breaks downstream prompt.

The tag is spelled <EALUATION_RESULT> (missing "V"), whereas REVIEW_INSTRUCTION expects EVALUATION_RESULT. This inconsistency may confuse the LLM and any XML parsing utilities.

227-231: {guidelines} placeholder in REVIEW_INSTRUCTION is never interpolated.

Stage.completion receives the raw template, leaving {guidelines} untouched. Without proper interpolation, the LLM sees literal braces or potentially confusing content.

292-292: Consider making the evaluation score threshold configurable.

The hardcoded threshold of 0.99 for stopping refinement could benefit from being configurable, as different use cases may require different quality thresholds.

314-314: Address the TODO comment for instruction refinement.

The TODO comment indicates that the REFINE_INSTRUCTION template needs refinement.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 091fdeb and dcdf7cd.

📒 Files selected for processing (3)

src/draive/__init__.py (2 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

src/draive/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (26-51)

src/draive/helpers/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (26-51)

🔇 Additional comments (3)

src/draive/__init__.py (1)

109-109: LGTM! Clean export pattern followed.

The addition of refine_instruction to both the import list and __all__ tuple correctly exposes the new functionality at the package level. The alphabetical placement is consistent with the existing codebase patterns.

Also applies to: 386-386

src/draive/helpers/__init__.py (1)

5-5: LGTM! Proper module integration.

The import and export of refine_instruction follows the established pattern and correctly integrates the new functionality into the helpers subpackage.

Also applies to: 22-22

src/draive/helpers/instruction_refinement.py (1)

25-52: Well-structured asynchronous workflow implementation.

The refine_instruction function provides a clean, staged approach to instruction refinement with proper logging, state management, and error handling through the Stage.sequence pattern. The type annotations and parameter structure are well-designed.

src/draive/helpers/instruction_refinement.py

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (1)

src/draive/evaluation/suite.py (1)
381-405: 🛠️ Refactor suggestion

EvaluationSuiteData type parameter should mirror the suite’s parameter class

You pass EvaluationSuiteData[CaseParameters], which keeps the original TypeVar rather than the concrete self._parameters.
At runtime this can lead to TypeError: 'typing.TypeVar' object is not subscriptable when the data class metaprogramming kicks in.
-                suite_storage = _EvaluationSuiteMemoryStorage[CaseParameters](
-                    data_type=EvaluationSuiteData[CaseParameters],
+                suite_storage = _EvaluationSuiteMemoryStorage[CaseParameters](
+                    data_type=EvaluationSuiteData[self._parameters],

♻️ Duplicate comments (5)

src/draive/helpers/instruction_refinement.py (5)
32-32: ⚠️ Potential issue

Replace assert with explicit runtime validation (previous feedback still applies)

assert rounds_limit > 0 is not enforced when Python runs with the -O optimisation flag.
Please raise a proper exception instead.
-    assert rounds_limit > 0  # nosec: B101
+    if rounds_limit <= 0:
+        raise ValueError("rounds_limit must be greater than 0")
225-229: ⚠️ Potential issue

{guidelines} placeholder is never interpolated (still unresolved)

REVIEW_INSTRUCTION contains a {guidelines} token, but you are passing the raw template to the Stage, leaving the placeholder untouched.
-                instruction=REVIEW_INSTRUCTION,
+                instruction=REVIEW_INSTRUCTION.format(guidelines=guidelines or ""),
258-260: ⚠️ Potential issue

guidelines may become literal "None" in prompt

When guidelines is None, .format(guidelines=guidelines) inserts the string "None".
Use a default empty string instead.
-            instruction=REFINE_INSTRUCTION.format(
-                guidelines=guidelines,
+            instruction=REFINE_INSTRUCTION.format(
+                guidelines=guidelines or "",
289-293: 🛠️ Refactor suggestion

Hard-coded quality threshold reduces configurability

0.99 is still hard-coded. Make it an argument (e.g., quality_threshold: float = 0.99) so callers can adapt the refinement strictness.

215-222: ⚠️ Potential issue

Typo breaks downstream XML parsing – <EALUATION_RESULT> → <EVALUATION_RESULT>

The miss-spelled tag will confuse MultimodalTagElement.parse_first and any prompt logic that expects EVALUATION_RESULT.
-            "\n<EALUATION_RESULT>",
+            "\n<EVALUATION_RESULT>",
 …
-            "</EALUATION_RESULT>",
+            "</EVALUATION_RESULT>",

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcdf7cd and 0053d3d.

📒 Files selected for processing (11)

guides/BasicEvaluation.md (1 hunks)
src/draive/__init__.py (2 hunks)
src/draive/evaluation/suite.py (6 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_preparation.py (6 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)
src/draive/instructions/file.py (1 hunks)
src/draive/instructions/state.py (2 hunks)
src/draive/instructions/template.py (1 hunks)
src/draive/instructions/types.py (4 hunks)
src/draive/instructions/volatile.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

src/draive/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-49)

src/draive/instructions/template.py (3)

src/draive/parameters/function.py (1)

arguments (202-203)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/commons/metadata.py (2)

name (100-106)

description (121-127)

src/draive/helpers/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-49)

src/draive/helpers/instruction_preparation.py (2)

src/draive/parameters/function.py (1)

arguments (202-203)

src/draive/instructions/template.py (3)

instruction (81-84)

instruction (88-93)

instruction (96-117)

src/draive/instructions/file.py (1)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/helpers/instruction_refinement.py (7)

src/draive/stages/stage.py (5)

Stage (137-1933)

sequence (946-996)

transform_result (603-640)

loop (838-943)

trim_context (685-753)

src/draive/evaluation/suite.py (8)

EvaluationSuite (232-480)

SuiteEvaluatorResult (99-142)

of (147-176)

relative_score (87-96)

relative_score (133-142)

cases (416-421)

report (52-84)

report (106-130)

src/draive/instructions/types.py (5)

Instruction (46-216)

of (88-97)

of (101-110)

of (113-150)

format (167-183)

src/draive/instructions/state.py (6)

Instructions (39-152)

fetch (74-81)

fetch (85-93)

fetch (97-105)

fetch (108-148)

of (41-51)

src/draive/lmm/types.py (2)

LMMCompletion (95-113)

LMMInput (74-92)

src/draive/multimodal/tags.py (1)

MultimodalTagElement (15-373)

src/draive/utils/processing.py (1)

Processing (96-202)

🪛 LanguageTool

guides/BasicEvaluation.md

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🔇 Additional comments (15)

src/draive/__init__.py (1)

109-109: LGTM! Consistent package-level export.

The refine_instruction function is properly imported and exported following the established alphabetical ordering convention in both the import statement and __all__ tuple.

Also applies to: 386-386
src/draive/helpers/__init__.py (1)

5-5:
Details

✅ Verification successful

Verify the instruction_refinement module exists.

The import and export look correct and follow established patterns. However, please ensure that the src/draive/helpers/instruction_refinement.py module exists and contains the refine_instruction function.

Also applies to: 22-22

🏁 Script executed:
#!/bin/bash
# Description: Verify that the instruction_refinement module exists and exports refine_instruction
# Expected: File should exist and contain the refine_instruction function

# Check if the module file exists
if [[ -f "src/draive/helpers/instruction_refinement.py" ]]; then
    echo "✓ Module file exists"
    # Check if refine_instruction function is defined
    if grep -q "^def refine_instruction\|^async def refine_instruction" "src/draive/helpers/instruction_refinement.py"; then
        echo "✓ refine_instruction function found"
    else
        echo "✗ refine_instruction function not found"
    fi
else
    echo "✗ Module file not found"
fi
Length of output: 299
Confirmed: instruction_refinement Module Exists

The file src/draive/helpers/instruction_refinement.py is present and defines the refine_instruction function as expected. No further action needed.
src/draive/instructions/template.py (1)

44-44: Align with updated InstructionDeclarationArgument data model.

The change from specification=parameter.specification to description=parameter.description correctly aligns with the updated InstructionDeclarationArgument class structure that now uses a description field instead of specification. The Parameter.description property returns the appropriate type (str | None).

Note: This is a potential breaking change if external code depends on the specification field of InstructionDeclarationArgument.
src/draive/instructions/state.py (2)

40-51: Well-designed factory method for in-memory instructions.

The of class method provides a clean way to create Instructions instances from in-memory instruction objects. The implementation correctly:

Uses appropriate parameter unpacking for variadic instructions

Creates InstructionsVolatileStorage with the instruction tuple

Returns Instructions instance with proper method assignments

Maintains consistency with the existing file class method pattern

16-16:
Details

✅ Verification successful

Verify InstructionsVolatileStorage module exists.

The import looks correct and follows the established pattern.

🏁 Script executed:
#!/bin/bash
# Description: Verify that the volatile module exists and exports InstructionsVolatileStorage
# Expected: File should exist and contain the InstructionsVolatileStorage class

# Check if the module file exists
if [[ -f "src/draive/instructions/volatile.py" ]]; then
    echo "✓ Volatile module file exists"
    # Check if InstructionsVolatileStorage class is defined
    if grep -q "^class InstructionsVolatileStorage" "src/draive/instructions/volatile.py"; then
        echo "✓ InstructionsVolatileStorage class found"
    else
        echo "✗ InstructionsVolatileStorage class not found"
    fi
else
    echo "✗ Volatile module file not found"
fi
Length of output: 291
✅ InstructionsVolatileStorage module exists — changes approved

Verified src/draive/instructions/volatile.py is present.

Confirmed the InstructionsVolatileStorage class is defined.

No further action required.
src/draive/instructions/volatile.py (1)

13-24: LGTM! Clean in-memory storage implementation.

The class design is well-structured with pre-computed declarations for efficient listing operations and proper error handling for missing instructions.
src/draive/instructions/file.py (1)

52-52:
Details

✅ Verification successful

Verify compatibility with simplified argument construction.

The change from using specification to only using argument name aligns with the broader refactoring in the codebase. However, ensure that removing the specification information doesn't break existing functionality that might rely on richer argument metadata.

Let me check if this change is consistent across the codebase:

🏁 Script executed:
#!/bin/bash
# Description: Check for any remaining usage of specification field in InstructionDeclarationArgument
# Expected: No remaining references to specification field

# Search for specification field usage
rg -A 5 -B 5 "specification.*InstructionDeclarationArgument|InstructionDeclarationArgument.*specification"

# Search for any InstructionDeclarationArgument constructor calls with specification
rg -A 2 "InstructionDeclarationArgument.*specification"
Length of output: 166

Let's inspect the InstructionDeclarationArgument class itself to confirm its constructor signature:

🏁 Script executed:
#!/bin/bash
# List all files referencing InstructionDeclarationArgument
rg -l "InstructionDeclarationArgument"

# Show the class definition and its __init__ signature
rg -A 10 -B 0 "class InstructionDeclarationArgument"
Length of output: 836
InstructionDeclarationArgument Simplification Verified

No remaining references to the old specification field were found in the codebase. In src/draive/instructions/types.py, the InstructionDeclarationArgument constructor now defines:
name: str
description: str | None = None
required: bool = True
which means callers using only name continue to receive the default description=None and required=True. This simplification is fully backward-compatible and requires no further changes.
src/draive/helpers/instruction_preparation.py (4)

9-12: Good practice: Using tuple for all.

Converting from list to tuple is a good practice since __all__ should be immutable.

48-48: Improved XML structure by moving variables outside USER_TASK.

The change to move the formatted variables outside the <USER_TASK> tag improves the XML structure and clarity.

91-91: Improved clarity by removing ambiguous phrase.

Removing "or not achievable" from the instruction clarifies when to use "Task not achievable" response.

105-107: Enhanced variable usage instructions.

The clarification about variable resolution and part title naming improves the instruction preparation guidance.
src/draive/instructions/types.py (3)

33-33: Simplified argument model with description field.

Replacing the complex specification field with a simple description string simplifies the argument model. This is a good simplification that aligns with the broader refactoring.

158-165: Dynamic declaration generation via property.

The declaration property provides a clean way to generate instruction declarations dynamically. The implementation correctly extracts variables from the content string and maintains consistency with the instruction's metadata.

219-227: Robust variable extraction using Python's Formatter.

The extract_variables function uses Python's built-in Formatter to parse format strings, which is the correct approach for extracting variable names from format strings. The implementation properly filters out empty field names.

However, consider adding a docstring to document the function's purpose and behavior:
def extract_variables(
    string: str,
    /,
) -> Sequence[InstructionDeclarationArgument]:
+    """Extract variable names from a format string and return as InstructionDeclarationArgument objects.
+    
+    Args:
+        string: Format string containing variable placeholders like {variable_name}
+        
+    Returns:
+        Sequence of InstructionDeclarationArgument objects for each variable found
+    """
    return [
        InstructionDeclarationArgument(name=field_name)
        for _, field_name, _, _ in Formatter().parse(string)
        if field_name
    ]
src/draive/helpers/instruction_refinement.py (1)

24-31: Verify experimental generic-function syntax

The def refine_instruction[CaseParameters: ...] notation relies on PEP 695, available only on Python 3.12+.
Confirm your runtime/tooling supports this; otherwise revert to TypeVar-based generics.

src/draive/instructions/volatile.py

src/draive/helpers/instruction_preparation.py

src/draive/evaluation/suite.py

guides/BasicEvaluation.md

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (5)

src/draive/instructions/volatile.py (1)

32-43: Consider documenting unused arguments parameter.

The arguments parameter is accepted but not used in the method implementation. This might be intentional for protocol compliance, but consider adding a docstring or comment to clarify this behavior.
src/draive/evaluation/suite.py (1)
400-404: Variable shadowing inside list-comprehension still hurts readability

The inner loop variable case re-uses the pluralised outer variable name cases, reviving the exact readability issue raised in a previous review.
Please rename the comprehension variable to something unique (params, single_case, …) so the intent is instantly clear.
-                    cases=[EvaluationSuiteCase(parameters=case) for case in cases],
+                    cases=[EvaluationSuiteCase(parameters=params) for params in cases],
Also applies to: 517-520
src/draive/helpers/instruction_refinement.py (2)
33-35: Replace assert with explicit argument validation

assert statements are skipped when Python runs with the -O flag, so the critical checks on rounds_limit and quality_threshold may silently disappear in production.
-    assert rounds_limit > 0  # nosec: B101
-    assert 1 >= quality_threshold >= 0  # nosec: B101
+    if rounds_limit <= 0:
+        raise ValueError("`rounds_limit` must be greater than 0")
+    if not (0 <= quality_threshold <= 1):
+        raise ValueError("`quality_threshold` must be between 0 and 1")
223-229: Typo in <EVALUATION_RESULT> tag breaks downstream prompt parsing

The tag opening string is missing the “V” (<EALUATION_RESULT>).
REVIEW_INSTRUCTION expects the correctly-spelled tag, so the LLM receives a malformed prompt and any XML parsing utilities will fail.
-            "\n<EALUATION_RESULT>",
+            "\n<EVALUATION_RESULT>",
…
-            "</EALUATION_RESULT>",
+            "</EVALUATION_RESULT>",
guides/BasicEvaluation.md (1)
1-4: 🧹 Nitpick (assertive)

Add missing definite article for smoother reading
-Draive framework provides comprehensive evaluation capabilities
+The Draive framework provides comprehensive evaluation capabilities
🧰 Tools

🪛 LanguageTool

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0053d3d and f241080.

📒 Files selected for processing (11)

guides/BasicEvaluation.md (1 hunks)
src/draive/__init__.py (2 hunks)
src/draive/evaluation/suite.py (6 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_preparation.py (6 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)
src/draive/instructions/file.py (1 hunks)
src/draive/instructions/state.py (2 hunks)
src/draive/instructions/template.py (1 hunks)
src/draive/instructions/types.py (4 hunks)
src/draive/instructions/volatile.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

src/draive/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-52)

src/draive/helpers/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-52)

src/draive/instructions/template.py (3)

src/draive/parameters/function.py (1)

arguments (202-203)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/commons/metadata.py (2)

name (100-106)

description (121-127)

src/draive/instructions/types.py (2)

src/draive/commons/metadata.py (3)

Meta (20-275)

description (121-127)

name (100-106)

src/draive/parameters/model.py (1)

DataModel (386-738)

src/draive/instructions/state.py (2)

src/draive/instructions/volatile.py (3)

InstructionsVolatileStorage (14-43)

instruction (32-43)

listing (26-30)

src/draive/instructions/types.py (4)

of (88-97)

of (101-110)

of (113-150)

Instruction (46-216)

src/draive/instructions/file.py (1)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

🪛 LanguageTool

guides/BasicEvaluation.md

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🔇 Additional comments (13)

src/draive/__init__.py (1)

109-109: LGTM! Proper exposure of new functionality.

The addition of refine_instruction to both the import statement and __all__ tuple correctly makes the new instruction refinement functionality available at the package level.

Also applies to: 386-386

src/draive/helpers/__init__.py (1)

5-5: LGTM! Consistent package structure.

The import and export of refine_instruction correctly follows the package structure pattern and makes the new functionality available through the helpers module.

Also applies to: 22-22

src/draive/instructions/state.py (1)

16-16: LGTM! Proper import addition.

The import of InstructionsVolatileStorage supports the new functionality correctly.

src/draive/instructions/volatile.py (1)

13-44: LGTM! Clean volatile storage implementation.

The InstructionsVolatileStorage class provides a clean in-memory storage mechanism for instructions. The implementation correctly stores instructions by name and maintains declarations for listing.

src/draive/instructions/file.py (1)

52-52: LGTM! Simplification aligns with argument declaration refactoring.

The change from using explicit specification to just the argument name is consistent with the broader simplification of InstructionDeclarationArgument in the types module.

src/draive/helpers/instruction_preparation.py (5)

9-12: LGTM! Tuple is preferred for immutable exports.

Using a tuple for __all__ is appropriate since the exported symbols shouldn't change at runtime.

48-48: Approve XML structure improvement.

Moving the variables outside the USER_TASK tag creates cleaner separation between the task description and variables.

75-86: LGTM! Improved variable formatting with better readability.

The change from XML with JSON specifications to bullet-point format with descriptions is more readable and aligns with the simplified argument model. The fallback for missing descriptions handles the case where argument.description might be None.

96-96: LGTM! Clearer task achievability guidance.

Removing "or not achievable" makes the instruction clearer about when to use "Task not achievable."

114-150: LGTM! Better structural organization with PARTS.

The rename from SECTIONS to PARTS and the reordering of instructions provides clearer guidance for instruction preparation.

src/draive/instructions/types.py (3)

31-34: LGTM! Simplified argument model improves usability.

Replacing the complex specification field with a simple optional description string makes the argument model much more straightforward and user-friendly.

158-165: LGTM! Dynamic declaration extraction is well-implemented.

The declaration property provides a clean way to extract instruction declarations dynamically. The use of extract_variables ensures consistency with the instruction content.

219-227: LGTM! Robust variable extraction using standard library.

The extract_variables function correctly uses string.Formatter to parse format field names from instruction content. The list comprehension properly filters out empty field names and creates appropriate argument declarations.

src/draive/instructions/template.py

src/draive/instructions/state.py

src/draive/helpers/instruction_refinement.py

coderabbitai

Actionable comments posted: 3

🔭 Outside diff range comments (1)

src/draive/helpers/instruction_preparation.py (1)
48-52: ⚠️ Potential issue

guidelines with curly braces will crash .format

User-supplied guidelines that contain { or } raise KeyError during
PREPARE_INSTRUCTION.format(...). Escape braces or switch to
str.replace.
-            instruction=PREPARE_INSTRUCTION.format(
-                guidelines=...
-            ),
+            instruction=PREPARE_INSTRUCTION.replace(
+                "{guidelines}",
+                (f"\n<GUIDELINES>{guidelines}</GUIDELINES>\n" if guidelines else "").replace("{", "{{").replace("}", "}}"),
+            ),

♻️ Duplicate comments (9)

src/draive/instructions/template.py (1)

41-47: TODO comment about string verification remains unaddressed.

The comment questioning whether all parameters should be verified as strings was flagged in a previous review but hasn't been resolved. This should be addressed to clarify the validation requirements.

src/draive/instructions/state.py (1)

40-51: Good factory method, but consider validation for duplicate names.

The of class method provides a convenient way to create Instructions from volatile storage. However, consider validating for duplicate instruction names to prevent silent overwrites.

src/draive/instructions/volatile.py (1)

32-43: Consider documenting unused arguments parameter.

The arguments parameter is accepted but not used in the method implementation. This might be intentional for protocol compliance, but consider adding a docstring or comment to clarify this behavior.
src/draive/helpers/instruction_preparation.py (2)
25-25: Python ≤ 3.11 cannot parse generic-function syntax
This was flagged previously and is still present.
-async def prepare_instruction[CaseParameters: DataModel, Result: DataModel | str](
+async def prepare_instruction(
75-83: Avoid “None” in variable list

When argument.description is None the bullet renders None, which is
misleading.
-            f"- {argument.name}: {argument.description}"
+            f"- {argument.name}: {argument.description or ''}".rstrip(': ')
src/draive/helpers/instruction_refinement.py (3)
33-35: Use explicit runtime checks instead of assert

assert statements are skipped with python -O, disabling validation in
production.
-    assert rounds_limit > 0  # nosec: B101
-    assert 1 >= quality_threshold >= 0  # nosec: B101
+    if rounds_limit <= 0:
+        raise ValueError("rounds_limit must be greater than 0")
+    if not 0 <= quality_threshold <= 1:
+        raise ValueError("quality_threshold must be between 0 and 1")
223-229: Typo in XML tag – missing “V” in <EVALUATION_RESULT>

The analyser produces <EALUATION_RESULT> which does not match the tag
documented in REVIEW_INSTRUCTION and will not be parsed.
-            "\n<EALUATION_RESULT>",
+            "\n<EVALUATION_RESULT>",
 ...
-            "</EALUATION_RESULT>",
+            "</EVALUATION_RESULT>",
265-267: Brace-injection risk for guidelines placeholder

Same issue as in instruction_preparation.py: braces inside guidelines
will either format-explode or leak into the LLM prompt.

Apply the brace-escaping approach suggested earlier.
guides/BasicEvaluation.md (1)

3-3: Grammar correction needed

The sentence is missing the definite article "the" before "Draive framework".

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f241080 and e4a92bb.

📒 Files selected for processing (14)

guides/BasicEvaluation.md (1 hunks)
pyproject.toml (1 hunks)
src/draive/__init__.py (2 hunks)
src/draive/evaluation/evaluator.py (1 hunks)
src/draive/evaluation/scenario.py (2 hunks)
src/draive/evaluation/suite.py (11 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_preparation.py (8 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)
src/draive/instructions/file.py (1 hunks)
src/draive/instructions/state.py (2 hunks)
src/draive/instructions/template.py (1 hunks)
src/draive/instructions/types.py (4 hunks)
src/draive/instructions/volatile.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

src/draive/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-52)

src/draive/instructions/template.py (3)

src/draive/parameters/function.py (1)

arguments (202-203)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/commons/metadata.py (2)

name (100-106)

description (121-127)

src/draive/helpers/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-52)

src/draive/instructions/file.py (1)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/evaluation/scenario.py (2)

src/draive/evaluation/evaluator.py (3)

report (70-91)

passed (67-68)

evaluation (379-389)

src/draive/evaluation/suite.py (6)

report (52-88)

report (109-133)

passed (48-50)

passed (106-107)

relative_score (91-99)

relative_score (136-140)

src/draive/helpers/instruction_preparation.py (4)

src/draive/parameters/function.py (1)

arguments (202-203)

src/draive/instructions/file.py (1)

instruction (63-82)

src/draive/instructions/template.py (3)

instruction (81-84)

instruction (88-93)

instruction (96-117)

src/draive/instructions/volatile.py (1)

instruction (32-43)

🪛 LanguageTool

guides/BasicEvaluation.md

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🔇 Additional comments (16)

pyproject.toml (1)

8-8: Version bump looks appropriate for the new functionality.

The minor version increment from "0.66.5" to "0.67.0" correctly reflects the addition of new public API functionality, including the refine_instruction feature.

src/draive/__init__.py (2)

109-109: LGTM! Import follows established pattern.

The import of refine_instruction from draive.helpers is correctly placed and formatted.

386-386: LGTM! Public export properly added.

Adding refine_instruction to the __all__ tuple correctly exposes the new functionality in the public API.

src/draive/helpers/__init__.py (2)

5-5: LGTM! Import correctly references the new module.

The import statement properly brings in refine_instruction from the newly created instruction_refinement module.

22-22: LGTM! Public export properly configured.

Adding refine_instruction to the __all__ tuple correctly exposes it as part of the helpers package public API.

src/draive/instructions/template.py (1)

44-44: LGTM! Parameter attribute change aligns with API refactoring.

The change from specification=parameter.specification to description=parameter.description correctly aligns with the updated InstructionDeclarationArgument type structure. The types are compatible (str | None).

src/draive/instructions/volatile.py (1)

13-24: Well-designed volatile storage implementation.

The InstructionsVolatileStorage class provides a clean in-memory storage solution for instructions. The dictionary-based storage by name enables efficient lookups, and the pre-computed declarations list supports fast listing operations.

src/draive/instructions/file.py (1)

52-52: Good simplification of argument declaration.

The removal of the specification parameter aligns with the simplified InstructionDeclarationArgument structure that now uses a description field instead of the complex specification dictionary. This change improves consistency across the instruction system.

src/draive/evaluation/evaluator.py (1)

70-91: Excellent refactoring of the report method.

The changes significantly improve the output format:

Centralized status logic with the status variable enhances maintainability

XML-like structured format for detailed reports provides better readability

Improved comment handling with quotes and "N/A" fallback

Simplified non-detailed format reduces redundancy

The XML structure aligns well with the instruction refinement workflow mentioned in the AI summary.

guides/BasicEvaluation.md (2)

9-33: Well-designed evaluator example

The keyword_evaluator implementation demonstrates good practices with proper error handling for empty keyword lists and clear scoring logic. The use of the @evaluator decorator with name and threshold parameters is correctly shown.

143-199: Excellent evaluation suite example

The evaluation suite example effectively demonstrates the complete workflow from defining test cases to executing and reporting results. The integration with the new sequence-based storage capability (shown in line 186) aligns well with the framework enhancements.

src/draive/evaluation/suite.py (5)

72-79: Improved structured reporting format

The XML-like structured output provides better organization and machine-readable format. The TODO comment about converting DataModel to XML representation indicates future enhancement potential.

95-99: Enhanced relative score calculation

The change from counting passed results to averaging relative scores provides a more nuanced scoring mechanism that better reflects the quality of evaluations rather than just pass/fail status.

381-402: Valuable sequence storage support

The addition of Sequence[CaseParameters] support enables direct initialization with case parameter lists, making the API more convenient for in-memory testing scenarios. The pattern matching correctly handles the sequence case and creates appropriate storage.

484-518: Consistent sequence support in factory function

The evaluation_suite factory function correctly mirrors the sequence support added to the class method, maintaining API consistency across different initialization approaches.

540-542: Clean constructor enhancement

The updated constructor properly handles the optional cases parameter with a sensible default of empty tuple, maintaining backward compatibility while enabling the new functionality.

src/draive/evaluation/scenario.py

src/draive/instructions/types.py

src/draive/helpers/instruction_refinement.py

coderabbitai

Actionable comments posted: 4

🔭 Outside diff range comments (1)

src/draive/helpers/instruction_preparation.py (1)
25-31: ⚠️ Potential issue

PEP-695 generic header breaks on Python ≤ 3.11

def prepare_instruction[CaseParameters: DataModel, …] only parses on 3.12+.
Unless the project already requires 3.12, drop the inline generics or move them to TypeVars:
-async def prepare_instruction[CaseParameters: DataModel, Result: DataModel | str](
+CaseParametersT = TypeVar("CaseParametersT", bound=DataModel)
+ResultT = TypeVar("ResultT", bound=DataModel | str)
+
+async def prepare_instruction(
Failing to do so prevents the module from importing on most current interpreters.

♻️ Duplicate comments (6)

src/draive/instructions/template.py (1)

41-47: Address the unresolved TODO comment about string verification.

The comment questioning string verification remains unaddressed from previous reviews. Consider adding validation logic to ensure parameter values are strings or updating type annotations if all arguments should indeed be strings.

src/draive/instructions/state.py (1)

40-51: Address the duplicate name validation concern.

The implementation looks good for creating Instructions from volatile storage. However, the previous review comment about validating duplicate instruction names remains unaddressed and is still applicable.

src/draive/instructions/volatile.py (1)

32-43: Document the unused arguments parameter.

The previous review comment about documenting the unused arguments parameter for protocol compliance remains applicable and unaddressed.
src/draive/helpers/instruction_preparation.py (1)
48-51: 🛠️ Refactor suggestion

str.format brace-injection – crashes on { / } inside guidelines

PREPARE_INSTRUCTION.format(guidelines=...) interprets any curly braces contained inside PREPARE_INSTRUCTION.
If the guidelines text itself carries braces the formatting fails with KeyError or worse.

Patch:
-            instruction=PREPARE_INSTRUCTION.format(
-                guidelines=f"\n<GUIDELINES>{guidelines}</GUIDELINES>\n" if guidelines else "",
-            ),
+            instruction=PREPARE_INSTRUCTION.replace(
+                "{guidelines}",
+                (
+                    f"\n<GUIDELINES>{guidelines}</GUIDELINES>\n".replace("{", "{{").replace("}", "}}")
+                    if guidelines
+                    else ""
+                ),
+            ),
…and change the literal {guidelines} placeholder in the template to stay compatible.
src/draive/helpers/instruction_refinement.py (1)
24-32: ⚠️ Potential issue

Generic function header & asserts replicate earlier portability issues

The square-bracket generic header is still present – see earlier comment in instruction_preparation.py.

Using assert for user input validation (rounds_limit, quality_threshold) is unsafe because it disappears under python -O.
-async def refine_instruction[CaseParameters: DataModel, Result: DataModel | str](
+async def refine_instruction(
...
-    assert rounds_limit > 0  # nosec: B101
-    assert 1 >= quality_threshold >= 0  # nosec: B101
+    if rounds_limit <= 0:
+        raise ValueError("rounds_limit must be > 0")
+    if not 0 <= quality_threshold <= 1:
+        raise ValueError("quality_threshold must be within [0, 1]")
Please apply the same fix or raise the Python requirement.
guides/BasicEvaluation.md (1)
1-3: Minor grammar tweak

Add the definite article for smoother reading.
-# Basic Evaluation Usage
-
-Draive framework provides comprehensive evaluation capabilities
+# Basic Evaluation Usage
+
+The Draive framework provides comprehensive evaluation capabilities
🧰 Tools

🪛 LanguageTool

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c1b5720 and 63c4634.

📒 Files selected for processing (14)

guides/BasicEvaluation.md (1 hunks)
pyproject.toml (1 hunks)
src/draive/__init__.py (2 hunks)
src/draive/evaluation/evaluator.py (1 hunks)
src/draive/evaluation/scenario.py (2 hunks)
src/draive/evaluation/suite.py (11 hunks)
src/draive/helpers/__init__.py (2 hunks)
src/draive/helpers/instruction_preparation.py (8 hunks)
src/draive/helpers/instruction_refinement.py (1 hunks)
src/draive/instructions/file.py (1 hunks)
src/draive/instructions/state.py (2 hunks)
src/draive/instructions/template.py (1 hunks)
src/draive/instructions/types.py (4 hunks)
src/draive/instructions/volatile.py (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (6)

src/draive/helpers/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-52)

src/draive/__init__.py (1)

src/draive/helpers/instruction_refinement.py (1)

refine_instruction (24-52)

src/draive/instructions/file.py (1)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/instructions/template.py (3)

src/draive/parameters/function.py (1)

arguments (202-203)

src/draive/instructions/types.py (1)

InstructionDeclarationArgument (31-34)

src/draive/commons/metadata.py (2)

name (100-106)

description (121-127)

src/draive/evaluation/evaluator.py (2)

src/draive/evaluation/scenario.py (2)

passed (33-35)

relative_score (68-74)

src/draive/evaluation/suite.py (4)

passed (48-50)

passed (106-107)

relative_score (91-99)

relative_score (136-144)

src/draive/evaluation/scenario.py (2)

src/draive/evaluation/evaluator.py (4)

report (70-92)

passed (67-68)

relative_score (95-99)

evaluation (387-397)

src/draive/evaluation/suite.py (6)

report (52-88)

report (109-133)

passed (48-50)

passed (106-107)

relative_score (91-99)

relative_score (136-144)

🪛 LanguageTool

guides/BasicEvaluation.md

[uncategorized] ~2-~2: You might be missing the article “the” here.
Context: # Basic Evaluation Usage Draive framework provides comprehensive evalua...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🔇 Additional comments (15)

pyproject.toml (1)

8-8: Version bump looks appropriate.

The minor version increment from "0.66.5" to "0.67.0" correctly follows semantic versioning for the addition of new instruction refinement capabilities without breaking changes.

src/draive/__init__.py (1)

109-109: Export additions follow established patterns correctly.

The import and export of refine_instruction maintain consistency with the existing codebase structure and preserve alphabetical ordering in the __all__ tuple.

Also applies to: 386-386

src/draive/helpers/__init__.py (1)

5-5: Helper module export structure is correct.

The import from instruction_refinement module and addition to __all__ properly exposes the new function through the helpers package, maintaining consistent module organization.

Also applies to: 22-22

src/draive/instructions/template.py (1)

44-44: Parameter structure change aligns with type definition.

The change from specification=parameter.specification to description=parameter.description correctly matches the updated InstructionDeclarationArgument structure shown in the relevant code snippets.

src/draive/instructions/volatile.py (1)

13-25: LGTM! Clean implementation of volatile instruction storage.

The class design is clean with efficient dictionary-based lookup and proper initialization of both storage and declaration listings. The use of dictionary comprehension and list comprehension for initialization is appropriate.

src/draive/instructions/file.py (1)

52-52: LGTM! Simplification aligns with class refactoring.

The removal of the specification parameter correctly aligns with the InstructionDeclarationArgument class simplification, where the complex specification field was replaced with a simpler description field.

src/draive/evaluation/scenario.py (3)

43-47: LGTM! Variable renaming improves clarity.

Renaming report to evaluations_report eliminates potential confusion with the method name and makes the variable's purpose clearer.

51-56: LGTM! XML format improves consistency.

The XML-like structured format aligns well with the reporting patterns used in other evaluation classes (as seen in src/draive/evaluation/evaluator.py and src/draive/evaluation/suite.py). The :.2f formatting also addresses the previous concern about floating-point precision in percentage display.

72-74: LGTM! Simplified calculation is more readable.

The list comprehension approach is more concise and readable than the explicit loop, while maintaining the same logic for calculating the ratio of passed evaluations.

src/draive/instructions/types.py (1)

158-166: Nice addition – self-describing declaration property

Exposing instruction.declaration greatly simplifies discovery of variables and metadata. The implementation is clean and keeps argument order intact – well done!
src/draive/helpers/instruction_preparation.py (1)

71-87: Avoid “None” appearing in the variables list

The new bullet-list format is clearer, but when argument.description is None we now render
- var_name: None. Consider omitting the colon to keep the output tidy:
-            f"- {argument.name}: {argument.description}"
+            f"- {argument.name}: {argument.description}"
             if argument.description
-            else f"- {argument.name}"
+            else f"- {argument.name}"
Current code already does this – great!
src/draive/evaluation/suite.py (4)

385-385: LGTM! Enhanced storage flexibility

The addition of Sequence[CaseParameters] support in with_storage method provides a convenient way to initialize evaluation suites with in-memory test cases. The implementation correctly converts the sequence into EvaluationSuiteCase objects and uses appropriate pattern matching.

Also applies to: 402-406

58-58: LGTM! Improved reporting format

The XML-like structure for detailed reports provides better structured output and includes valuable information like relative scores and evaluated values. The format is consistent and well-organized.

Also applies to: 72-79, 82-82

95-99: LGTM! More accurate relative score calculation

The updated relative score calculation using averages instead of simple pass/fail counts provides a more nuanced and accurate representation of evaluation quality across multiple criteria.

Also applies to: 140-144

488-492: LGTM! Consistent sequence support in factory function

The evaluation_suite factory function now consistently supports sequence inputs, matching the functionality added to the with_storage method. The implementation maintains type safety and proper initialization.

Also applies to: 518-522

src/draive/evaluation/evaluator.py

coderabbitai · 2025-05-29T13:29:22Z

src/draive/instructions/types.py

+def extract_variables(
+    string: str,
+    /,
+) -> Sequence[InstructionDeclarationArgument]:
+    seen: set[str] = set()
+    unique_args: list[InstructionDeclarationArgument] = []
+    for _, field_name, _, _ in Formatter().parse(string):
+        if field_name and field_name not in seen:
+            seen.add(field_name)
+            unique_args.append(InstructionDeclarationArgument(name=field_name))
+
+    return unique_args
+


🧹 Nitpick (assertive)

Minor: retain ordering without a set look-up penalty

extract_variables correctly de-duplicates but the set look-ups in the hot loop could be avoided by using an OrderedDict.fromkeys(...). Given typical prompt sizes the impact is negligible, so feel free to ignore.

🤖 Prompt for AI Agents

In src/draive/instructions/types.py around lines 219 to 231, the extract_variables function uses a set to track seen field names for de-duplication, which causes a set look-up in the loop. To avoid this minor performance penalty and retain ordering, replace the set and list approach with an OrderedDict.fromkeys(...) on the parsed field names to get unique keys in order, then convert those keys into InstructionDeclarationArgument instances for the return list.

src/draive/helpers/instruction_refinement.py

KaQuMiQ force-pushed the feature/instruction_optimization branch 12 times, most recently from 4beec02 to b75184c Compare January 17, 2025 16:28

KaQuMiQ force-pushed the feature/instruction_optimization branch 5 times, most recently from ee8e6a8 to 85703a2 Compare March 13, 2025 08:55

KaQuMiQ force-pushed the feature/instruction_optimization branch from 85703a2 to 6b9d43b Compare May 5, 2025 08:21

KaQuMiQ force-pushed the feature/instruction_optimization branch from 6b9d43b to f7420c2 Compare May 19, 2025 15:24

coderabbitai bot reviewed May 19, 2025

View reviewed changes

KaQuMiQ force-pushed the feature/instruction_optimization branch from f7420c2 to 091fdeb Compare May 28, 2025 16:51

coderabbitai bot reviewed May 28, 2025

View reviewed changes

KaQuMiQ force-pushed the feature/instruction_optimization branch from 091fdeb to dcdf7cd Compare May 29, 2025 07:27

coderabbitai bot reviewed May 29, 2025

View reviewed changes

src/draive/helpers/instruction_refinement.py Show resolved Hide resolved

KaQuMiQ force-pushed the feature/instruction_optimization branch from dcdf7cd to 0053d3d Compare May 29, 2025 11:23

coderabbitai bot reviewed May 29, 2025

View reviewed changes

src/draive/instructions/volatile.py Show resolved Hide resolved

src/draive/helpers/instruction_preparation.py Outdated Show resolved Hide resolved

src/draive/evaluation/suite.py Show resolved Hide resolved

guides/BasicEvaluation.md Show resolved Hide resolved

KaQuMiQ force-pushed the feature/instruction_optimization branch from 0053d3d to f241080 Compare May 29, 2025 11:48

coderabbitai bot reviewed May 29, 2025

View reviewed changes

src/draive/instructions/template.py Show resolved Hide resolved

src/draive/instructions/state.py Show resolved Hide resolved

src/draive/helpers/instruction_refinement.py Show resolved Hide resolved

KaQuMiQ force-pushed the feature/instruction_optimization branch from f241080 to e4a92bb Compare May 29, 2025 13:07

coderabbitai bot reviewed May 29, 2025

View reviewed changes

src/draive/evaluation/scenario.py Show resolved Hide resolved

src/draive/instructions/types.py Outdated Show resolved Hide resolved

src/draive/helpers/instruction_refinement.py Show resolved Hide resolved

KaQuMiQ force-pushed the feature/instruction_optimization branch from e4a92bb to c1b5720 Compare May 29, 2025 13:14

Prepare instruction optimization method

63c4634

KaQuMiQ force-pushed the feature/instruction_optimization branch from c1b5720 to 63c4634 Compare May 29, 2025 13:23

coderabbitai bot reviewed May 29, 2025

View reviewed changes

KaQuMiQ merged commit d8471a1 into main May 29, 2025
5 checks passed

KaQuMiQ deleted the feature/instruction_optimization branch May 29, 2025 13:38

coderabbitai bot mentioned this pull request Jun 2, 2025

Refeine instructions refinement #328

Merged

coderabbitai bot mentioned this pull request Aug 7, 2025

Simplify and cleanup evaluations #389

Merged

coderabbitai bot mentioned this pull request Aug 29, 2025

Rework LMM to GenerativeModel #394

Merged

This was referenced Oct 21, 2025

Prepare content templates #452

Merged

Refine evaluator instructions #454

Merged

Comments

Conversation

KaQuMiQ commented Jan 16, 2025

Uh oh!

coderabbitai bot commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented May 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)