Skip to content

Comments

Update evals interface#353

Merged
KaQuMiQ merged 1 commit intomainfrom
feature/evals
Jul 3, 2025
Merged

Update evals interface#353
KaQuMiQ merged 1 commit intomainfrom
feature/evals

Conversation

@KaQuMiQ
Copy link
Collaborator

@KaQuMiQ KaQuMiQ commented Jul 3, 2025

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Jul 3, 2025

Warning

Rate limit exceeded

@KaQuMiQ has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 12 minutes and 22 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between bc0b63f and 589bcd3.

📒 Files selected for processing (7)
  • pyproject.toml (1 hunks)
  • src/draive/__init__.py (0 hunks)
  • src/draive/evaluation/suite.py (4 hunks)
  • src/draive/evaluation/value.py (2 hunks)
  • src/draive/gemini/lmm_generation.py (2 hunks)
  • src/draive/helpers/instruction_preparation.py (1 hunks)
  • src/draive/helpers/instruction_refinement.py (3 hunks)

"""

Walkthrough

This change set increments the project version from "0.73.0" to "0.73.1" in pyproject.toml. It removes the import and re-export of four symbols related to instruction preparation and refinement from src/draive/__init__.py. The evaluation suite in src/draive/evaluation/suite.py is refactored to unify the result type with explicit suite parameters, simplify the call interface, and always return a suite-level result. The evaluation score type in src/draive/evaluation/value.py adds a "max" literal and a corresponding constant. The Gemini LMM generation logic in src/draive/gemini/lmm_generation.py removes an error branch for empty completions. The prepare_instruction function signature is simplified by removing generics. The refine_instruction function is extensively refactored to implement a binary tree exploration model with pruning, staged focused and full evaluations, and modularized candidate generation and selection.

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🔭 Outside diff range comments (1)
src/draive/helpers/instruction_preparation.py (1)

24-29: Consider adding a docstring

The function lacks documentation. Since this is a helper function that prepares instructions, it would benefit from a docstring explaining its purpose, parameters, and return value.

 async def prepare_instruction(
     instruction: InstructionDeclaration | str,
     /,
     *,
     guidelines: str | None = None,
 ) -> Instruction:
+    """
+    Prepare a detailed instruction from a declaration or description.
+    
+    Args:
+        instruction: Either an InstructionDeclaration object or a string description
+        guidelines: Optional additional guidelines for instruction preparation
+        
+    Returns:
+        A prepared Instruction object
+        
+    Raises:
+        InstructionPreparationAmbiguity: When clarification is needed
+        ValueError: When instruction preparation fails
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9f603d and 9f50758.

📒 Files selected for processing (7)
  • pyproject.toml (1 hunks)
  • src/draive/__init__.py (0 hunks)
  • src/draive/evaluation/suite.py (4 hunks)
  • src/draive/evaluation/value.py (2 hunks)
  • src/draive/gemini/lmm_generation.py (2 hunks)
  • src/draive/helpers/instruction_preparation.py (1 hunks)
  • src/draive/helpers/instruction_refinement.py (3 hunks)
💤 Files with no reviewable changes (1)
  • src/draive/init.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Follow Ruff import ordering (standard library, third party, local) Us...

**/*.py: Follow Ruff import ordering (standard library, third party, local)
Use Python 3.12+ type features (type unions with |, generic syntax)
Use base and abstract types like Sequence or Iterable instead of concrete types
Use custom exceptions for specific errors
Format code with Ruff
Run linters (Ruff + Bandit + Pyright strict mode)

📄 Source: CodeRabbit Inference Engine (CLAUDE.md)

List of files the instruction was applied to:

  • src/draive/helpers/instruction_preparation.py
  • src/draive/evaluation/value.py
  • src/draive/gemini/lmm_generation.py
  • src/draive/evaluation/suite.py
  • src/draive/helpers/instruction_refinement.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.
pyproject.toml (1)
Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.
🧬 Code Graph Analysis (1)
src/draive/evaluation/suite.py (1)
src/draive/parameters/model.py (1)
  • DataModel (386-756)
🪛 Pylint (3.3.7)
src/draive/helpers/instruction_preparation.py

[convention] 24-24: Missing function or method docstring

(C0116)

src/draive/evaluation/value.py

[error] 8-8: Parsing failed: 'invalid syntax (draive.evaluation.value, line 8)'

(E0001)

🪛 Flake8 (7.2.0)
src/draive/evaluation/value.py

[error] 8-8: SyntaxError: invalid syntax

(E999)

🔇 Additional comments (10)
pyproject.toml (1)

8-8: LGTM!

The minor version bump from 0.73.0 to 0.73.1 is appropriate for the interface updates and refactoring changes in this PR.

src/draive/evaluation/value.py (1)

8-19: LGTM!

The addition of the "max" literal and corresponding MAX constant (1.0) is well-implemented and maintains consistency with the existing evaluation score system.

Also applies to: 27-27, 53-55

src/draive/gemini/lmm_generation.py (2)

257-266: Ensure empty completion content is deliberate

The change in src/draive/gemini/lmm_generation.py (lines 257–266) now always returns an LMMCompletion—even when completion_content is empty—whereas other providers (OpenAI, vLLM, Ollama) only emit a completion when there’s actual content. Please confirm:

• Empty completions from the Gemini path are acceptable and won’t break downstream logic
• All callers can safely handle an LMMCompletion whose content may be empty
• If not intended, consider re-adding a guard or error branch before returning


436-440: Gemini generator will now emit empty LMMStreamChunks

Changing the branch from elif chunk_content: to else: means that when no content parts are collected, you’ll still yield

LMMStreamChunk.of(MultimodalContent.of(*chunk_content), eod=False)

with an empty payload. Downstream consumers do not currently guard against empty chunk.content, so please verify they handle empty chunks without side-effects or UI glitches.

Review these locations:

  • src/draive/gemini/lmm_generation.py lines 436–440
  • src/draive/conversation/realtime/default.py around line 97
  • src/draive/openai/lmm_session.py around line 365
src/draive/evaluation/suite.py (3)

102-108: LGTM!

The addition of SuiteParameters generic type and parameters field to SuiteEvaluatorResult provides better type safety and enables tracking of suite-level parameters alongside case results.


278-283: Simplified and consistent API

The removal of overloads and unified return type SuiteEvaluatorResult improves API consistency. The flexible *case_parameters argument supports multiple input types elegantly.


336-349: Well-structured concurrent evaluation

The use of gather for concurrent case evaluation and proper parameter handling (using suite parameters from input or falling back to stored parameters) is well implemented.

src/draive/helpers/instruction_refinement.py (3)

19-87: Well-structured implementation of tree-based refinement approach!

The staged approach with initialization, exploration, and finalization provides excellent separation of concerns. The parameter validation is thorough and the orchestration using Stage.sequence is clean.


120-169: Excellent data structure design for tree exploration!

The separation between _RefinementTreeNode and _RefinementState is clean, with clear responsibilities. The properties for node characteristics (is_root, is_leaf) and score access make the code more readable.


635-671: Comprehensive tree statistics for debugging and analysis!

The statistics provide valuable insights into the exploration efficiency, pruning effectiveness, and the path to the best solution. The division by zero is properly handled with the ternary operator.

Comment on lines 619 to 629
is_leaf = len(node.children) == 0
EXCEPTIONAL_THRESHOLD = 0.95
is_exceptional = node.focused_evaluation_score > EXCEPTIONAL_THRESHOLD

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Extract exceptional threshold as a configurable parameter.

The hardcoded threshold value reduces flexibility and makes the logic less transparent.

Define the constant at module level or make it configurable:

# At module level after imports
_EXCEPTIONAL_SCORE_THRESHOLD: float = 0.95

Then update the usage:

-        EXCEPTIONAL_THRESHOLD = 0.95
-        is_exceptional = node.focused_evaluation_score > EXCEPTIONAL_THRESHOLD
+        is_exceptional = node.focused_evaluation_score > _EXCEPTIONAL_SCORE_THRESHOLD
🤖 Prompt for AI Agents
In src/draive/helpers/instruction_refinement.py around lines 619 to 622, the
exceptional threshold value is hardcoded as 0.95, reducing flexibility. Move
this constant to the module level by defining _EXCEPTIONAL_SCORE_THRESHOLD =
0.95 after the imports, then replace the inline value with this constant in the
code to improve configurability and clarity.

@KaQuMiQ KaQuMiQ merged commit 3e415bf into main Jul 3, 2025
4 of 5 checks passed
@KaQuMiQ KaQuMiQ deleted the feature/evals branch July 3, 2025 15:28
@coderabbitai coderabbitai bot mentioned this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant