refactor: migrate grouped rollout from customized workflows to inference engines #789

garrett4wade · 2026-01-03T08:38:20Z

Description

Motivation: With the proxy server, AReaL allows users to implement customized agents with pure OpenAI SDK. However, users' code usually doesn't and should not support grouped generation, which is usually required for RL training. We want the user to just write an agent that produces a single trajectory.

This PR implements grouped sampling within WorkflowExecutor and AReaL will handle grouped sampling internally with a workflow that returns a single trajectory.
We can maintain backward compatibility if the workflow still produces grouped trajectories but the gconfig.n_samples is set to 1 in the trainer.
This PR refactors the configuration related to group_size. The global gconfig.n_samples field specifies the group_size for training and actor.group_size is removed. We also add an eval_gconfig field that specifies the group_size for evaluation, e.g., when we require repeated sampling and then taking the average.

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not
work as expected)
Documentation update
Code refactoring (no functional changes)
Performance improvement
Test coverage improvement

Checklist

I have read the Contributing Guide
I have run formatting tools (pre-commit or manual)
I have run relevant unit tests and they pass
I have added tests for new functionality
I have updated documentation if needed
My branch is up to date with main
This PR introduces breaking changes (if yes, fill out details below)
If this PR changes documentation, I have built and previewed it locally with
jb build docs
No critical issues raised by AI reviewers (/gemini review)

Breaking Change Details (if applicable):

The user's workflow should return a single trajectory instead of a group. However, it will not lead to an error. This bug just silently increases the group size.

gemini-code-assist · 2026-01-03T08:41:58Z

Summary of Changes

Hello @garrett4wade, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors how grouped rollouts are managed within the AReaL framework. The core change involves centralizing the grouped sampling mechanism into the WorkflowExecutor and introducing a dedicated GroupedRolloutWorkflow. This simplifies the implementation for users, as their custom workflows now only need to produce single trajectories, with the system handling the grouping. The configuration for group sizes has been streamlined, consolidating training-related group size into gconfig.n_samples and adding a new eval_gconfig for evaluation. Furthermore, the PR enhances debugging and logging capabilities by moving trajectory dumping logic to the WorkflowExecutor and providing new configuration options. These changes aim to improve modularity, reduce boilerplate in custom workflows, and offer more flexible control over rollout behavior.

Highlights

Centralized Grouped Sampling: Grouped sampling logic has been migrated from customized workflows to the WorkflowExecutor, introducing a new GroupedRolloutWorkflow to handle repeated sampling internally. This simplifies user workflows, allowing them to produce single trajectories while the system manages grouped generation for tasks like RL training.
Configuration Streamlining: The configuration for group_size has been refactored. The actor.group_size field has been removed, and gconfig.n_samples now centrally specifies the group_size for training. A new eval_gconfig field has been added to specify the group_size specifically for evaluation scenarios, such as when repeated sampling for averaging is required.
Dynamic Batch Sizing and Trajectory Dumping: New dynamic_bs parameters have been introduced to prepare_batch methods, enabling dynamic batch sizing where collection stops when a certain number of accepted/rejected samples is met. Additionally, trajectory dumping functionality has been moved to the WorkflowExecutor, with new fileroot, tokenizer_path, and dump_to_file options added to InferenceEngineConfig for better control over logging and debugging.
Workflow Context Management: A new WorkflowContext has been introduced using contextvars to provide execution context (e.g., is_eval, task_id) to workflows. This allows workflows to adapt their behavior based on whether they are running in training or evaluation mode and to correctly scope statistics.
Codebase Cleanup and Renaming: Several deprecated parameters like granularity and dynamic_sampling have been removed from various configurations and methods. Workflow constructors no longer directly handle dump_dir or rollout_stat_scope. Additionally, example directory names like multi-turn-math and search-agent have been standardized to use underscores (e.g., multi_turn_math, search_agent).

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request is a significant refactoring that centralizes the grouped rollout logic into the WorkflowExecutor. This is a great architectural improvement as it simplifies individual workflows, which now only need to be concerned with generating a single trajectory. The introduction of GroupedRolloutWorkflow to wrap user workflows is a clean solution.

The configuration is also cleaned up: group_size is removed from PPOActorConfig and a global gconfig.n_samples is used for training, with a new eval_gconfig for evaluation. The dynamic_sampling logic is also refactored into a dynamic_bs flag, which is more explicit.

The changes are extensive but appear to be consistent and well-executed across the codebase, including configuration files, core logic, workflows, examples, and documentation. I've found a couple of minor typos in the documentation that I've commented on.

docs/lite/gsm8k_grpo.md

garrett4wade · 2026-01-04T13:27:09Z

/gemini review

gemini-code-assist

Code Review

This pull request is a significant refactoring that centralizes the logic for grouped rollouts. The responsibility for handling multiple samples per prompt (group_size) is moved from individual workflows into a new GroupedRolloutWorkflow wrapper. This simplifies custom workflow implementations, as they now only need to handle a single trajectory. The configuration is also cleaned up by removing actor.group_size and using the global gconfig.n_samples. Additionally, a new eval_gconfig is introduced to allow separate generation settings for evaluation, which is a great improvement for flexibility. The changes are extensive and consistently applied across the codebase, including configurations, APIs, controllers, engines, and documentation. I've identified a couple of issues in the refactored workflows that could lead to incorrect behavior and should be addressed.

areal/experimental/workflow/multi_turn_v2.py

areal/workflow/multi_turn.py

garrett4wade · 2026-01-04T13:31:15Z

/gemini review

gemini-code-assist

Code Review

This pull request is a significant and well-executed refactoring that moves the responsibility of grouped rollouts from individual workflows into the core WorkflowExecutor. This simplifies the implementation of custom agentic workflows, as they now only need to handle a single trajectory. The introduction of GroupedRolloutWorkflow as a wrapper is a clean solution. The configuration changes, such as removing actor.group_size and adding eval_gconfig, are consistent and improve clarity. The updates across the documentation and examples are thorough and align with the new design. I've found one critical issue and a suggestion for improving documentation.

areal/experimental/workflow/multi_turn_v2.py

areal/core/workflow_executor.py

Copilot

Pull request overview

This PR refactors grouped rollout handling by migrating it from user-defined workflows to the inference engine layer, introducing a new GroupedRolloutWorkflow wrapper that internally manages trajectory grouping.

Key Changes:

Introduced GroupedRolloutWorkflow wrapper class that internally handles n_samples grouping via asyncio.gather
Removed actor.group_size configuration field and replaced it with gconfig.n_samples for training and new eval_gconfig.n_samples for evaluation
Updated all workflow implementations to return single trajectories instead of grouped batches
Renamed granularity parameter to group_size across all engine API methods for clarity

Reviewed changes

Copilot reviewed 74 out of 74 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
areal/core/workflow_executor.py	Added `GroupedRolloutWorkflow` wrapper class and modified `_resolve_workflow` to wrap workflows when `group_size > 1`
areal/api/cli_args.py	Removed `group_size` from `PPOActorConfig`, added `eval_gconfig` field to `PPOConfig`
areal/api/engine_api.py	Renamed `granularity` parameter to `group_size` in `rollout_batch` and `prepare_batch` methods
areal/workflow/*.py	Updated RLVR, multi-turn, and vision workflows to return single trajectories instead of grouped results
examples/*/train.py	Updated agent workflows to use single client instead of multiple clients with `asyncio.gather`
examples/*/config.yaml	Removed `actor.group_size` configuration, updated `adv_norm` config in search_agent example
docs/*.md	Updated documentation and tutorials to reflect single trajectory workflow pattern
areal/tests/*.py	Added `group_size` parameter to test calls
areal/experimental/workflow/multi_turn_v2.py	Refactored to return single trajectory with explicit `n_samples=1`

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

areal/experimental/workflow/multi_turn_v2.py

examples/experimental/proxy/train_sep.py

rchardx

LGTM

garrett4wade · 2026-01-05T05:45:31Z

The rebase only involves documentation change. @rchardx please review again

garrett4wade marked this pull request as draft January 3, 2026 08:38

gemini-code-assist bot reviewed Jan 3, 2026

View reviewed changes

docs/lite/gsm8k_grpo.md Outdated Show resolved Hide resolved

docs/lite/gsm8k_grpo.md Outdated Show resolved Hide resolved

garrett4wade force-pushed the fw/group branch 3 times, most recently from 1561319 to 02285d4 Compare January 4, 2026 12:44

garrett4wade marked this pull request as ready for review January 4, 2026 13:27

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

areal/experimental/workflow/multi_turn_v2.py Outdated Show resolved Hide resolved

areal/workflow/multi_turn.py Outdated Show resolved Hide resolved

garrett4wade requested a review from Copilot January 4, 2026 13:31

garrett4wade changed the title ~~[wip] refactor: migrate grouped rollout from customized workflows to inference engines~~ refactor: migrate grouped rollout from customized workflows to inference engines Jan 4, 2026

Copilot started reviewing on behalf of garrett4wade January 4, 2026 13:31 View session

gemini-code-assist bot reviewed Jan 4, 2026

View reviewed changes

areal/experimental/workflow/multi_turn_v2.py Show resolved Hide resolved

areal/core/workflow_executor.py Show resolved Hide resolved

Copilot AI reviewed Jan 4, 2026

View reviewed changes

areal/experimental/workflow/multi_turn_v2.py Show resolved Hide resolved

examples/experimental/proxy/train_sep.py Show resolved Hide resolved

garrett4wade added the safe-to-test Ready to run unit-tests in a PR. label Jan 4, 2026

garrett4wade force-pushed the fw/group branch from d6242ab to e5d9370 Compare January 4, 2026 13:39

garrett4wade added safe-to-test Ready to run unit-tests in a PR. and removed safe-to-test Ready to run unit-tests in a PR. labels Jan 4, 2026

garrett4wade temporarily deployed to AReaL-unittests January 4, 2026 13:43 — with GitHub Actions Inactive

rchardx approved these changes Jan 5, 2026

View reviewed changes

refactor: migrate grouped rollout to inference engines

65f79ef

garrett4wade force-pushed the fw/group branch from e5d9370 to 65f79ef Compare January 5, 2026 05:44

garrett4wade requested a review from rchardx January 5, 2026 05:45

rchardx merged commit 9497437 into main Jan 5, 2026
1 check passed

rchardx deleted the fw/group branch January 5, 2026 06:03

HwVanICI mentioned this pull request Jan 7, 2026

Add NPU RLVR example #798

Merged

16 tasks

refactor: migrate grouped rollout from customized workflows to inference engines #789

refactor: migrate grouped rollout from customized workflows to inference engines #789

Uh oh!

Conversation

garrett4wade commented Jan 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Type of Change

Checklist

Uh oh!

gemini-code-assist bot commented Jan 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

garrett4wade commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

garrett4wade commented Jan 4, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

rchardx left a comment

Choose a reason for hiding this comment

Uh oh!

garrett4wade commented Jan 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

garrett4wade commented Jan 3, 2026 •

edited

Loading