Skip to content

Comments

Fix concurrency in evals#360

Merged
KaQuMiQ merged 1 commit intomainfrom
feature/conc_eval
Jul 8, 2025
Merged

Fix concurrency in evals#360
KaQuMiQ merged 1 commit intomainfrom
feature/conc_eval

Conversation

@KaQuMiQ
Copy link
Collaborator

@KaQuMiQ KaQuMiQ commented Jul 7, 2025

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Jul 7, 2025

Walkthrough

This change refactors concurrency management across several modules by replacing the use of asyncio.gather with the execute_concurrently function, introducing explicit concurrency limits for evaluation and instruction refinement processes. Method and function signatures in EvaluationSuite and related classes are updated to accept new concurrency parameters, and argument passing is adjusted for compatibility with the new concurrency model. The haiway package dependency is updated, and minor cleanups such as removing unused type parameters and correcting argument unpacking are included.

Possibly related PRs

  • Update haiway and evals concurrency #359: Both PRs modify concurrency handling by replacing asyncio.gather with execute_concurrently and changing how executor functions are passed in evaluation modules.
  • Update evals interface #353: Both PRs update concurrency handling and method signatures in src/draive/evaluation/suite.py, specifically affecting EvaluationSuite.__call__, _evaluate, and the invocation of execute_concurrently.
  • Add evaluatin suite parameters #352: The concurrency and argument-passing changes in this PR build upon the parameterization extensions introduced in this earlier PR, directly refining the same classes and evaluation framework.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5782f28 and 86bfb8c.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml (2 hunks)
  • src/draive/evaluation/scenario.py (1 hunks)
  • src/draive/evaluation/suite.py (5 hunks)
  • src/draive/helpers/instruction_refinement.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Follow Ruff import ordering (standard library, third party, local) Us...

**/*.py: Follow Ruff import ordering (standard library, third party, local)
Use Python 3.12+ type features (type unions with |, generic syntax)
Use base and abstract types like Sequence or Iterable instead of concrete types
Use custom exceptions for specific errors
Format code with Ruff
Run linters (Ruff + Bandit + Pyright strict mode)

📄 Source: CodeRabbit Inference Engine (CLAUDE.md)

List of files the instruction was applied to:

  • src/draive/evaluation/scenario.py
  • src/draive/helpers/instruction_refinement.py
  • src/draive/evaluation/suite.py
🧠 Learnings (2)
pyproject.toml (2)
Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
src/draive/evaluation/suite.py (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use Protocol and runtime_checkable for function protocols
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test (ubuntu-latest, 3.13)
  • GitHub Check: test (macos-latest, 3.13)
🔇 Additional comments (8)
pyproject.toml (1)

8-8: Version bump and dependency update look good.

The version increment and haiway dependency update are appropriate for this concurrency fix.

Also applies to: 24-24

src/draive/evaluation/scenario.py (1)

96-100: API update correctly implemented.

The change from keyword argument to positional argument aligns with the standardized execute_concurrently API.

src/draive/helpers/instruction_refinement.py (3)

6-6: Concurrency parameter properly added with validation.

Good addition of the concurrent_nodes parameter with appropriate default value, validation, and documentation.

Also applies to: 33-33, 50-50, 58-58


280-280: Tree exploration concurrency properly refactored.

The wrapper function and execute_concurrently usage correctly implement controlled parallelism for node exploration.

Also applies to: 295-313


406-406: Critical bug fix: correct argument passing to evaluation suite.

Good catch! The evaluation suite expects case_parameters as a single argument, not unpacked.

src/draive/evaluation/suite.py (3)

2-2: Import cleanup and API update correctly implemented.

The removal of unused gather import and updated execute_concurrently usage are consistent with the codebase-wide changes.

Also applies to: 193-196


286-295: Enhanced API with flexible case selection and concurrency control.

The refactored signature provides a cleaner API with multiple ways to specify evaluation cases and explicit concurrency control. The parameter name is correctly spelled as concurrent_cases.

Also applies to: 302-302, 311-311


370-384: Evaluation concurrency properly implemented with wrapper function.

The async wrapper and execute_concurrently usage correctly implement controlled parallel evaluation of cases.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b0d75aa and 5782f28.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml (2 hunks)
  • src/draive/evaluation/scenario.py (1 hunks)
  • src/draive/evaluation/suite.py (5 hunks)
  • src/draive/helpers/instruction_refinement.py (7 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Follow Ruff import ordering (standard library, third party, local) Us...

**/*.py: Follow Ruff import ordering (standard library, third party, local)
Use Python 3.12+ type features (type unions with |, generic syntax)
Use base and abstract types like Sequence or Iterable instead of concrete types
Use custom exceptions for specific errors
Format code with Ruff
Run linters (Ruff + Bandit + Pyright strict mode)

📄 Source: CodeRabbit Inference Engine (CLAUDE.md)

List of files the instruction was applied to:

  • src/draive/helpers/instruction_refinement.py
  • src/draive/evaluation/scenario.py
  • src/draive/evaluation/suite.py
🧠 Learnings (2)
pyproject.toml (2)
Learnt from: KaQuMiQ
PR: miquido/draive#338
File: src/draive/lmm/__init__.py:1-2
Timestamp: 2025-06-16T10:28:07.434Z
Learning: The draive project requires Python 3.12+ as specified in pyproject.toml with "requires-python = ">=3.12"" and uses Python 3.12+ specific features like PEP 695 type aliases and generic syntax extensively throughout the codebase.
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use absolute imports from draive package
src/draive/evaluation/suite.py (1)
Learnt from: CR
PR: miquido/draive#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-01T09:39:43.008Z
Learning: Applies to draive/**/*.py : Use Protocol and runtime_checkable for function protocols
🧬 Code Graph Analysis (1)
src/draive/helpers/instruction_refinement.py (1)
src/draive/evaluation/suite.py (1)
  • evaluation_suite (540-618)
🔇 Additional comments (7)
pyproject.toml (1)

8-8: Version updates look good!

The version bump to 0.75.6 and haiway dependency update to 0.24.3 are appropriate for this concurrency refactoring.

Also applies to: 24-24

src/draive/evaluation/scenario.py (1)

97-97: Consistent API usage for execute_concurrently.

The change from keyword to positional argument aligns with the standardized concurrency pattern across the codebase.

src/draive/helpers/instruction_refinement.py (3)

33-33: Good addition of concurrency control.

The concurrent_nodes parameter provides fine-grained control over parallel tree exploration with sensible defaults and validation.

Also applies to: 50-50, 58-58, 80-80


280-280: Excellent refactoring to controlled concurrency.

Replacing asyncio.gather with execute_concurrently prevents potential resource exhaustion by limiting the number of concurrent node explorations. The explore wrapper function provides clean encapsulation.

Also applies to: 295-313


406-406: Critical bug fix for evaluation suite invocation.

Correctly passes the cases list as a single argument instead of unpacking it, aligning with the updated evaluation suite API.

src/draive/evaluation/suite.py (2)

193-193: Consistent concurrency API usage.

Standardized to use positional argument for the executor function.


370-384: Well-structured concurrency refactoring.

The introduction of the evaluate_case wrapper and controlled concurrency via execute_concurrently improves resource management during parallel case evaluation.

@KaQuMiQ KaQuMiQ force-pushed the feature/conc_eval branch from 5782f28 to 86bfb8c Compare July 7, 2025 18:05
@KaQuMiQ KaQuMiQ merged commit 827dd90 into main Jul 8, 2025
5 checks passed
@KaQuMiQ KaQuMiQ deleted the feature/conc_eval branch July 8, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant