-
Notifications
You must be signed in to change notification settings - Fork 270
refactor: migrate grouped rollout from customized workflows to inference engines #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @garrett4wade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors how grouped rollouts are managed within the AReaL framework. The core change involves centralizing the grouped sampling mechanism into the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring that centralizes the grouped rollout logic into the WorkflowExecutor. This is a great architectural improvement as it simplifies individual workflows, which now only need to be concerned with generating a single trajectory. The introduction of GroupedRolloutWorkflow to wrap user workflows is a clean solution.
The configuration is also cleaned up: group_size is removed from PPOActorConfig and a global gconfig.n_samples is used for training, with a new eval_gconfig for evaluation. The dynamic_sampling logic is also refactored into a dynamic_bs flag, which is more explicit.
The changes are extensive but appear to be consistent and well-executed across the codebase, including configuration files, core logic, workflows, examples, and documentation. I've found a couple of minor typos in the documentation that I've commented on.
1561319 to
02285d4
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant refactoring that centralizes the logic for grouped rollouts. The responsibility for handling multiple samples per prompt (group_size) is moved from individual workflows into a new GroupedRolloutWorkflow wrapper. This simplifies custom workflow implementations, as they now only need to handle a single trajectory. The configuration is also cleaned up by removing actor.group_size and using the global gconfig.n_samples. Additionally, a new eval_gconfig is introduced to allow separate generation settings for evaluation, which is a great improvement for flexibility. The changes are extensive and consistently applied across the codebase, including configurations, APIs, controllers, engines, and documentation. I've identified a couple of issues in the refactored workflows that could lead to incorrect behavior and should be addressed.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a significant and well-executed refactoring that moves the responsibility of grouped rollouts from individual workflows into the core WorkflowExecutor. This simplifies the implementation of custom agentic workflows, as they now only need to handle a single trajectory. The introduction of GroupedRolloutWorkflow as a wrapper is a clean solution. The configuration changes, such as removing actor.group_size and adding eval_gconfig, are consistent and improve clarity. The updates across the documentation and examples are thorough and align with the new design. I've found one critical issue and a suggestion for improving documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors grouped rollout handling by migrating it from user-defined workflows to the inference engine layer, introducing a new GroupedRolloutWorkflow wrapper that internally manages trajectory grouping.
Key Changes:
- Introduced
GroupedRolloutWorkflowwrapper class that internally handlesn_samplesgrouping viaasyncio.gather - Removed
actor.group_sizeconfiguration field and replaced it withgconfig.n_samplesfor training and neweval_gconfig.n_samplesfor evaluation - Updated all workflow implementations to return single trajectories instead of grouped batches
- Renamed
granularityparameter togroup_sizeacross all engine API methods for clarity
Reviewed changes
Copilot reviewed 74 out of 74 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| areal/core/workflow_executor.py | Added GroupedRolloutWorkflow wrapper class and modified _resolve_workflow to wrap workflows when group_size > 1 |
| areal/api/cli_args.py | Removed group_size from PPOActorConfig, added eval_gconfig field to PPOConfig |
| areal/api/engine_api.py | Renamed granularity parameter to group_size in rollout_batch and prepare_batch methods |
| areal/workflow/*.py | Updated RLVR, multi-turn, and vision workflows to return single trajectories instead of grouped results |
| examples/*/train.py | Updated agent workflows to use single client instead of multiple clients with asyncio.gather |
| examples/*/config.yaml | Removed actor.group_size configuration, updated adv_norm config in search_agent example |
| docs/*.md | Updated documentation and tutorials to reflect single trajectory workflow pattern |
| areal/tests/*.py | Added group_size parameter to test calls |
| areal/experimental/workflow/multi_turn_v2.py | Refactored to return single trajectory with explicit n_samples=1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rchardx
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
The rebase only involves documentation change. @rchardx please review again |
Description
Motivation: With the proxy server, AReaL allows users to implement customized agents with pure OpenAI SDK. However, users' code usually doesn't and should not support grouped generation, which is usually required for RL training. We want the user to just write an agent that produces a single trajectory.
WorkflowExecutorand AReaL will handle grouped sampling internally with a workflow that returns a single trajectory.gconfig.n_samplesis set to 1 in the trainer.group_size. The globalgconfig.n_samplesfield specifies thegroup_sizefor training andactor.group_sizeis removed. We also add aneval_gconfigfield that specifies thegroup_sizefor evaluation, e.g., when we require repeated sampling and then taking the average.Type of Change
work as expected)
Checklist
jb build docs/gemini review)Breaking Change Details (if applicable):
The user's workflow should return a single trajectory instead of a group. However, it will not lead to an error. This bug just silently increases the group size.