[Roadmap] 2025 Q3 Milestones

# AReaL 2025 Enhancement Milestone Tracker

## Introduction

This issue tracks the major planned enhancements for AReaL until Oct 31, 2025. Our development roadmap is organized into two main categories to help contributors understand where they can make the most impact.

The **On-going** sections contain features that are currently under active development by the core AReaL team. These items represent our immediate priorities and are being actively worked on.

The **Planned but not in progress** sections list features that we have concrete implementation plans for, but currently lack the bandwidth to pursue. **We welcome contributions from the community for these items!** If you're interested in contributing to any of these planned features, please let us know and reach out to discuss  implementation details.

---

## Backends

### Done

N/A

### On-going

- [x] Megatron training backend support #256
- [x] SGLang large expert parallelism (EP) inference support #259 
- [ ] End-to-end MoE RL training with large EP inference and Megatron expert parallelism
- [x] Ulysses context parallelism & tensor parallelism for FSDP backend #258 
- [ ] Single-controller mode #260 
- [ ] Remote vLLM inference engine #351 

### Planned but not in progress

- [ ] RL training with SGLang/vLLM pipeline parallelism
- [ ] Distributed weight resharder for Megatron training backend
- [ ] Multi-LLM training (different agents with different parameters)
- [ ] Local SGLang inference engine with inference/training colocation (hybrid engine)
- [ ] Detailed profiling of the FSDP and training backend for best performance under different scales

---

## Usability

### Done

- [x] OpenAI-compatible client support #248

### On-going

- [ ] Support RLOO #354

### Planned but not in progress

- [ ] Support training GPT-oss and Seed-oss models
- [ ] Support running distributed training and debugging in Jupyter notebooks
- [ ] Provide benchmarking configuration examples #261 :
  - [x] DAPO #285 #294 #295
  - [ ] Bradley-Terry reward modeling #331
  - [ ] PPO with critic models
  - [ ] REINFORCE++


---

## Documentation

### Done

- [x] OpenAI-compatible client documentation #254
- [x] Out-of-memory (OOM) troubleshooting guide #287
- [x] AReaL debugging best practices: #287 #361 
  - [x] LLM server-only debugging - How to launch LLM servers independently and debug agent workflows
  - [x] Mock data and torchrun debugging - Creating synthetic data and using `torchrun` for algorithm debugging
  - [x] Training-free evaluation experiments - Running evaluations without training or additional GPUs

### Planned but not in progress

- [ ] AReaL performance tuning guide
  - [ ] How to split training and inference devices
  - [ ] How to set parallelism strategies for training and inference


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] 2025 Q3 Milestones #257

AReaL 2025 Enhancement Milestone Tracker

Introduction

Backends

Done

On-going

Planned but not in progress

Usability

Done

On-going

Planned but not in progress

Documentation

Done

Planned but not in progress

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Roadmap] 2025 Q3 Milestones #257

Description

AReaL 2025 Enhancement Milestone Tracker

Introduction

Backends

Done

On-going

Planned but not in progress

Usability

Done

On-going

Planned but not in progress

Documentation

Done

Planned but not in progress

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions