Skip to content

Conversation

@garrett4wade
Copy link
Collaborator

@garrett4wade garrett4wade commented Jan 3, 2026

Description

Motivation: With the proxy server, AReaL allows users to implement customized agents with pure OpenAI SDK. However, users' code usually doesn't and should not support grouped generation, which is usually required for RL training. We want the user to just write an agent that produces a single trajectory.

  • This PR implements grouped sampling within WorkflowExecutor and AReaL will handle grouped sampling internally with a workflow that returns a single trajectory.
  • We can maintain backward compatibility if the workflow still produces grouped trajectories but the gconfig.n_samples is set to 1 in the trainer.
  • This PR refactors the configuration related to group_size. The global gconfig.n_samples field specifies the group_size for training and actor.group_size is removed. We also add an eval_gconfig field that specifies the group_size for evaluation, e.g., when we require repeated sampling and then taking the average.

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not
    work as expected)
  • Documentation update
  • Code refactoring (no functional changes)
  • Performance improvement
  • Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • I have run formatting tools (pre-commit or manual)
  • I have run relevant unit tests and they pass
  • I have added tests for new functionality
  • I have updated documentation if needed
  • My branch is up to date with main
  • This PR introduces breaking changes (if yes, fill out details below)
  • If this PR changes documentation, I have built and previewed it locally with
    jb build docs
  • No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

The user's workflow should return a single trajectory instead of a group. However, it will not lead to an error. This bug just silently increases the group size.

@garrett4wade garrett4wade marked this pull request as draft January 3, 2026 08:38
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @garrett4wade, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors how grouped rollouts are managed within the AReaL framework. The core change involves centralizing the grouped sampling mechanism into the WorkflowExecutor and introducing a dedicated GroupedRolloutWorkflow. This simplifies the implementation for users, as their custom workflows now only need to produce single trajectories, with the system handling the grouping. The configuration for group sizes has been streamlined, consolidating training-related group size into gconfig.n_samples and adding a new eval_gconfig for evaluation. Furthermore, the PR enhances debugging and logging capabilities by moving trajectory dumping logic to the WorkflowExecutor and providing new configuration options. These changes aim to improve modularity, reduce boilerplate in custom workflows, and offer more flexible control over rollout behavior.

Highlights

  • Centralized Grouped Sampling: Grouped sampling logic has been migrated from customized workflows to the WorkflowExecutor, introducing a new GroupedRolloutWorkflow to handle repeated sampling internally. This simplifies user workflows, allowing them to produce single trajectories while the system manages grouped generation for tasks like RL training.
  • Configuration Streamlining: The configuration for group_size has been refactored. The actor.group_size field has been removed, and gconfig.n_samples now centrally specifies the group_size for training. A new eval_gconfig field has been added to specify the group_size specifically for evaluation scenarios, such as when repeated sampling for averaging is required.
  • Dynamic Batch Sizing and Trajectory Dumping: New dynamic_bs parameters have been introduced to prepare_batch methods, enabling dynamic batch sizing where collection stops when a certain number of accepted/rejected samples is met. Additionally, trajectory dumping functionality has been moved to the WorkflowExecutor, with new fileroot, tokenizer_path, and dump_to_file options added to InferenceEngineConfig for better control over logging and debugging.
  • Workflow Context Management: A new WorkflowContext has been introduced using contextvars to provide execution context (e.g., is_eval, task_id) to workflows. This allows workflows to adapt their behavior based on whether they are running in training or evaluation mode and to correctly scope statistics.
  • Codebase Cleanup and Renaming: Several deprecated parameters like granularity and dynamic_sampling have been removed from various configurations and methods. Workflow constructors no longer directly handle dump_dir or rollout_stat_scope. Additionally, example directory names like multi-turn-math and search-agent have been standardized to use underscores (e.g., multi_turn_math, search_agent).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant refactoring that centralizes the grouped rollout logic into the WorkflowExecutor. This is a great architectural improvement as it simplifies individual workflows, which now only need to be concerned with generating a single trajectory. The introduction of GroupedRolloutWorkflow to wrap user workflows is a clean solution.

The configuration is also cleaned up: group_size is removed from PPOActorConfig and a global gconfig.n_samples is used for training, with a new eval_gconfig for evaluation. The dynamic_sampling logic is also refactored into a dynamic_bs flag, which is more explicit.

The changes are extensive but appear to be consistent and well-executed across the codebase, including configuration files, core logic, workflows, examples, and documentation. I've found a couple of minor typos in the documentation that I've commented on.

@garrett4wade garrett4wade force-pushed the fw/group branch 3 times, most recently from 1561319 to 02285d4 Compare January 4, 2026 12:44
@garrett4wade garrett4wade marked this pull request as ready for review January 4, 2026 13:27
@garrett4wade
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant refactoring that centralizes the logic for grouped rollouts. The responsibility for handling multiple samples per prompt (group_size) is moved from individual workflows into a new GroupedRolloutWorkflow wrapper. This simplifies custom workflow implementations, as they now only need to handle a single trajectory. The configuration is also cleaned up by removing actor.group_size and using the global gconfig.n_samples. Additionally, a new eval_gconfig is introduced to allow separate generation settings for evaluation, which is a great improvement for flexibility. The changes are extensive and consistently applied across the codebase, including configurations, APIs, controllers, engines, and documentation. I've identified a couple of issues in the refactored workflows that could lead to incorrect behavior and should be addressed.

@garrett4wade
Copy link
Collaborator Author

/gemini review

@garrett4wade garrett4wade requested a review from Copilot January 4, 2026 13:31
@garrett4wade garrett4wade changed the title [wip] refactor: migrate grouped rollout from customized workflows to inference engines refactor: migrate grouped rollout from customized workflows to inference engines Jan 4, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is a significant and well-executed refactoring that moves the responsibility of grouped rollouts from individual workflows into the core WorkflowExecutor. This simplifies the implementation of custom agentic workflows, as they now only need to handle a single trajectory. The introduction of GroupedRolloutWorkflow as a wrapper is a clean solution. The configuration changes, such as removing actor.group_size and adding eval_gconfig, are consistent and improve clarity. The updates across the documentation and examples are thorough and align with the new design. I've found one critical issue and a suggestion for improving documentation.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors grouped rollout handling by migrating it from user-defined workflows to the inference engine layer, introducing a new GroupedRolloutWorkflow wrapper that internally manages trajectory grouping.

Key Changes:

  • Introduced GroupedRolloutWorkflow wrapper class that internally handles n_samples grouping via asyncio.gather
  • Removed actor.group_size configuration field and replaced it with gconfig.n_samples for training and new eval_gconfig.n_samples for evaluation
  • Updated all workflow implementations to return single trajectories instead of grouped batches
  • Renamed granularity parameter to group_size across all engine API methods for clarity

Reviewed changes

Copilot reviewed 74 out of 74 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
areal/core/workflow_executor.py Added GroupedRolloutWorkflow wrapper class and modified _resolve_workflow to wrap workflows when group_size > 1
areal/api/cli_args.py Removed group_size from PPOActorConfig, added eval_gconfig field to PPOConfig
areal/api/engine_api.py Renamed granularity parameter to group_size in rollout_batch and prepare_batch methods
areal/workflow/*.py Updated RLVR, multi-turn, and vision workflows to return single trajectories instead of grouped results
examples/*/train.py Updated agent workflows to use single client instead of multiple clients with asyncio.gather
examples/*/config.yaml Removed actor.group_size configuration, updated adv_norm config in search_agent example
docs/*.md Updated documentation and tutorials to reflect single trajectory workflow pattern
areal/tests/*.py Added group_size parameter to test calls
areal/experimental/workflow/multi_turn_v2.py Refactored to return single trajectory with explicit n_samples=1

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@garrett4wade garrett4wade added the safe-to-test Ready to run unit-tests in a PR. label Jan 4, 2026
@garrett4wade garrett4wade added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Jan 4, 2026
Copy link
Collaborator

@rchardx rchardx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@garrett4wade
Copy link
Collaborator Author

The rebase only involves documentation change. @rchardx please review again

@garrett4wade garrett4wade requested a review from rchardx January 5, 2026 05:45
@rchardx rchardx merged commit 9497437 into main Jan 5, 2026
1 check passed
@rchardx rchardx deleted the fw/group branch January 5, 2026 06:03
@HwVanICI HwVanICI mentioned this pull request Jan 7, 2026
16 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants