Update evals interface by KaQuMiQ · Pull Request #353 · miquido/draive

KaQuMiQ · 2025-07-03T14:39:47Z

No description provided.

coderabbitai · 2025-07-03T14:39:53Z

Warning

Rate limit exceeded

@KaQuMiQ has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between bc0b63f and 589bcd3.

📒 Files selected for processing (7)

pyproject.toml (1 hunks)
src/draive/__init__.py (0 hunks)
src/draive/evaluation/suite.py (4 hunks)
src/draive/evaluation/value.py (2 hunks)
src/draive/gemini/lmm_generation.py (2 hunks)
src/draive/helpers/instruction_preparation.py (1 hunks)
src/draive/helpers/instruction_refinement.py (3 hunks)

"""

Walkthrough

This change set increments the project version from "0.73.0" to "0.73.1" in pyproject.toml. It removes the import and re-export of four symbols related to instruction preparation and refinement from src/draive/__init__.py. The evaluation suite in src/draive/evaluation/suite.py is refactored to unify the result type with explicit suite parameters, simplify the call interface, and always return a suite-level result. The evaluation score type in src/draive/evaluation/value.py adds a "max" literal and a corresponding constant. The Gemini LMM generation logic in src/draive/gemini/lmm_generation.py removes an error branch for empty completions. The prepare_instruction function signature is simplified by removing generics. The refine_instruction function is extensively refactored to implement a binary tree exploration model with pruning, staged focused and full evaluations, and modularized candidate generation and selection.

Possibly related PRs

Refeine instructions refinement #328: Both PRs restructure the refine_instruction logic, but with different refinement algorithms and state management approaches.
Prepare instruction writing helper #327: Both PRs modify the imports, exports, and function signature of prepare_instruction in the same modules, directly relating to each other's changes.
Add evaluatin suite parameters #352: Both PRs refactor the evaluation suite to introduce suite-level parameters and update related function signatures, affecting the same framework and functions.
"""

✨ Finishing Touches

📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 5

🔭 Outside diff range comments (1)

src/draive/helpers/instruction_preparation.py (1)

24-29: Consider adding a docstring

The function lacks documentation. Since this is a helper function that prepares instructions, it would benefit from a docstring explaining its purpose, parameters, and return value.

 async def prepare_instruction(
     instruction: InstructionDeclaration | str,
     /,
     *,
     guidelines: str | None = None,
 ) -> Instruction:
+    """
+    Prepare a detailed instruction from a declaration or description.
+    
+    Args:
+        instruction: Either an InstructionDeclaration object or a string description
+        guidelines: Optional additional guidelines for instruction preparation
+        
+    Returns:
+        A prepared Instruction object
+        
+    Raises:
+        InstructionPreparationAmbiguity: When clarification is needed
+        ValueError: When instruction preparation fails
+    """

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9f603d and 9f50758.

📒 Files selected for processing (7)

pyproject.toml (1 hunks)
src/draive/__init__.py (0 hunks)
src/draive/evaluation/suite.py (4 hunks)
src/draive/evaluation/value.py (2 hunks)
src/draive/gemini/lmm_generation.py (2 hunks)
src/draive/helpers/instruction_preparation.py (1 hunks)
src/draive/helpers/instruction_refinement.py (3 hunks)

💤 Files with no reviewable changes (1)

src/draive/init.py

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Follow Ruff import ordering (standard library, third party, local) Us...

**/*.py: Follow Ruff import ordering (standard library, third party, local)
Use Python 3.12+ type features (type unions with |, generic syntax)
Use base and abstract types like Sequence or Iterable instead of concrete types
Use custom exceptions for specific errors
Format code with Ruff
Run linters (Ruff + Bandit + Pyright strict mode)

📄 Source: CodeRabbit Inference Engine (CLAUDE.md)

List of files the instruction was applied to:

src/draive/helpers/instruction_preparation.py
src/draive/evaluation/value.py
src/draive/gemini/lmm_generation.py
src/draive/evaluation/suite.py
src/draive/helpers/instruction_refinement.py

🧠 Learnings (2)

📓 Common learnings

Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.

pyproject.toml (1)

Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.

🧬 Code Graph Analysis (1)

src/draive/evaluation/suite.py (1)

src/draive/parameters/model.py (1)

DataModel (386-756)

🪛 Pylint (3.3.7)

src/draive/helpers/instruction_preparation.py

[convention] 24-24: Missing function or method docstring

(C0116)

src/draive/evaluation/value.py

[error] 8-8: Parsing failed: 'invalid syntax (draive.evaluation.value, line 8)'

(E0001)

🪛 Flake8 (7.2.0)

src/draive/evaluation/value.py

[error] 8-8: SyntaxError: invalid syntax

(E999)

🔇 Additional comments (10)

pyproject.toml (1)

8-8: LGTM!

The minor version bump from 0.73.0 to 0.73.1 is appropriate for the interface updates and refactoring changes in this PR.

src/draive/evaluation/value.py (1)

8-19: LGTM!

The addition of the "max" literal and corresponding MAX constant (1.0) is well-implemented and maintains consistency with the existing evaluation score system.

Also applies to: 27-27, 53-55
src/draive/gemini/lmm_generation.py (2)

257-266: Ensure empty completion content is deliberate

The change in src/draive/gemini/lmm_generation.py (lines 257–266) now always returns an LMMCompletion—even when completion_content is empty—whereas other providers (OpenAI, vLLM, Ollama) only emit a completion when there’s actual content. Please confirm:

• Empty completions from the Gemini path are acceptable and won’t break downstream logic
• All callers can safely handle an LMMCompletion whose content may be empty
• If not intended, consider re-adding a guard or error branch before returning

436-440: Gemini generator will now emit empty LMMStreamChunks

Changing the branch from elif chunk_content: to else: means that when no content parts are collected, you’ll still yield
LMMStreamChunk.of(MultimodalContent.of(*chunk_content), eod=False)
with an empty payload. Downstream consumers do not currently guard against empty chunk.content, so please verify they handle empty chunks without side-effects or UI glitches.

Review these locations:

src/draive/gemini/lmm_generation.py lines 436–440

src/draive/conversation/realtime/default.py around line 97

src/draive/openai/lmm_session.py around line 365
src/draive/evaluation/suite.py (3)

102-108: LGTM!

The addition of SuiteParameters generic type and parameters field to SuiteEvaluatorResult provides better type safety and enables tracking of suite-level parameters alongside case results.

278-283: Simplified and consistent API

The removal of overloads and unified return type SuiteEvaluatorResult improves API consistency. The flexible *case_parameters argument supports multiple input types elegantly.

336-349: Well-structured concurrent evaluation

The use of gather for concurrent case evaluation and proper parameter handling (using suite parameters from input or falling back to stored parameters) is well implemented.

src/draive/helpers/instruction_refinement.py (3)

19-87: Well-structured implementation of tree-based refinement approach!

The staged approach with initialization, exploration, and finalization provides excellent separation of concerns. The parameter validation is thorough and the orchestration using Stage.sequence is clean.

120-169: Excellent data structure design for tree exploration!

The separation between _RefinementTreeNode and _RefinementState is clean, with clear responsibilities. The properties for node characteristics (is_root, is_leaf) and score access make the code more readable.

635-671: Comprehensive tree statistics for debugging and analysis!

The statistics provide valuable insights into the exploration efficiency, pruning effectiveness, and the path to the best solution. The division by zero is properly handled with the ternary operator.

src/draive/evaluation/suite.py

coderabbitai · 2025-07-03T14:53:54Z

src/draive/helpers/instruction_refinement.py

+        is_leaf = len(node.children) == 0
+        EXCEPTIONAL_THRESHOLD = 0.95
+        is_exceptional = node.focused_evaluation_score > EXCEPTIONAL_THRESHOLD
+


🛠️ Refactor suggestion

Extract exceptional threshold as a configurable parameter.

The hardcoded threshold value reduces flexibility and makes the logic less transparent.

Define the constant at module level or make it configurable:

# At module level after imports _EXCEPTIONAL_SCORE_THRESHOLD: float = 0.95

Then update the usage:

- EXCEPTIONAL_THRESHOLD = 0.95 - is_exceptional = node.focused_evaluation_score > EXCEPTIONAL_THRESHOLD + is_exceptional = node.focused_evaluation_score > _EXCEPTIONAL_SCORE_THRESHOLD

🤖 Prompt for AI Agents

In src/draive/helpers/instruction_refinement.py around lines 619 to 622, the exceptional threshold value is hardcoded as 0.95, reducing flexibility. Move this constant to the module level by defining _EXCEPTIONAL_SCORE_THRESHOLD = 0.95 after the imports, then replace the inline value with this constant in the code to improve configurability and clarity.

src/draive/helpers/instruction_refinement.py

KaQuMiQ force-pushed the feature/evals branch from a9f603d to 9f50758 Compare July 3, 2025 14:43

coderabbitai bot reviewed Jul 3, 2025

View reviewed changes

KaQuMiQ force-pushed the feature/evals branch from 9f50758 to bc0b63f Compare July 3, 2025 15:20

Update evals interface

589bcd3

KaQuMiQ force-pushed the feature/evals branch from bc0b63f to 589bcd3 Compare July 3, 2025 15:24

KaQuMiQ merged commit 3e415bf into main Jul 3, 2025
4 of 5 checks passed

KaQuMiQ deleted the feature/evals branch July 3, 2025 15:28

This was referenced Jul 4, 2025

Refine evals interfaces #355

Merged

Update instruction refinement #357

Merged

Fix concurrency in evals #360

Merged

coderabbitai bot mentioned this pull request Aug 7, 2025

Simplify and cleanup evaluations #389

Merged

coderabbitai bot mentioned this pull request Sep 3, 2025

Rework LMM to GenerativeModel #394

Merged

coderabbitai bot mentioned this pull request Oct 1, 2025

Fix postgres migrations #433

Merged

coderabbitai bot mentioned this pull request Oct 22, 2025

Refine evaluator instructions #454

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Update evals interface#353

Update evals interface#353
KaQuMiQ merged 1 commit intomainfrom
feature/evals

KaQuMiQ commented Jul 3, 2025

Uh oh!

coderabbitai bot commented Jul 3, 2025 •

edited

Loading

Rate limit exceeded

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

KaQuMiQ commented Jul 3, 2025

Uh oh!

coderabbitai bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jul 3, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)