Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 12 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,12 @@

</div>

rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.
rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.

## Releases 📰

## Releases 📰
<strong>[2025/07/01]</strong> We release [`DeepSWE-Preview`](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73), a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.

<strong>[2025/07/01]</strong> We release [`DeepSWE-Preview`](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73
), a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.
- 🍽️ An In-Depth Blog Post on our [SWE Agents and RL Training Recipes](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73)
- 🤗 HF Model [`DeepSWE-Preview`](https://huggingface.co/agentica-org/DeepSWE-Preview)
- 🤗 HF Dataset [`R2E-Gym-Subset`](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)
Expand All @@ -36,10 +35,10 @@ rLLM is an open-source framework for post-training language agents via reinforce
- 🔎 [Evaluation Logs](https://drive.google.com/file/d/10LIwpJeaFuiX6Y-qEG2a4a335PEuQJeS/view?usp=sharing)—16 passes over SWE-Bench-Verified.

<strong>[2025/04/08]</strong> We release [`DeepCoder-14B-Preview`](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51), a 14B coding model that achieves an impressive **60.6%** Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of `o3-mini-2025-01-031 (Low)` and `o1-2024-12-17`.

<strong>[2025/02/10]</strong> We release [`DeepScaleR-1.5B-Preview`](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2), a 1.5B model that surpasses O1-Preview and achieves <strong>43.1% Pass@1</strong> on AIME. We achieve this by iteratively scaling Deepseek's GRPO algorithm from 8K→16K->24K context length for thinking.

## Getting Started 🎯

### Installation

```bash
Expand All @@ -54,6 +53,8 @@ conda activate rllm
# Install all dependencies
pip install -e ./verl
pip install -e .

**Note:** On macOS, GPU features (flash-attn, deepspeed, vllm) are automatically excluded for compatibility. For GPU support on macOS, you can install with: `pip install -e .[gpu]`
```

### Installation with Docker 🐳
Expand All @@ -73,16 +74,16 @@ docker start rllm-container
docker exec -it rllm-container bash
```


## Acknowledgements

- Our training experiments are powered by our heavily modified fork of [verl](https://github.com/volcengine/verl), an open-source RLHF library.
- Our models are trained on top of [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), [`DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and [`Qwen3-32B`](https://huggingface.co/Qwen/Qwen3-32b).
- Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/), [Berkeley AI Research](https://bair.berkeley.edu/), and a successful collaboration with Together AI.

- Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/), [Berkeley AI Research](https://bair.berkeley.edu/), and a successful collaboration with Together AI.

## Citation

Citing rLLM:

```bibtex
@misc{rllm2025,
title={rLLM: A Framework for Post-Training Language Agents},
Expand All @@ -95,6 +96,7 @@ Citing rLLM:
```

Citing DeepSWE:

```bibtex
@misc{deepswe2025,
title={DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL},
Expand All @@ -106,6 +108,7 @@ Citing DeepSWE:
```

Citing DeepCoder:

```bibtex
@misc{deepcoder2025,
title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
Expand All @@ -117,6 +120,7 @@ Citing DeepCoder:
```

Citing DeepScaleR:

```bibtex
@misc{deepscaler2025,
title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},
Expand Down
15 changes: 11 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@ dependencies = [
"torch>=2.7",
"transformers",
"accelerate",
"flash-attn>=2.8.0.post2",
"flash-attn>=2.8.0.post2; sys_platform != 'darwin'", # Skip on macOS
"sentence-transformers",
"torchmetrics",

# Training and inference
"deepspeed",
"vllm>=0.8.3",
"sgl-kernel>=0.2.0",
"deepspeed; sys_platform != 'darwin'", # Skip on macOS
"vllm>=0.8.3; sys_platform != 'darwin'", # Skip on macOS
"sgl-kernel",
"sglang>=0.4.8.post1",
"sglang-router",
"peft",
Expand Down Expand Up @@ -88,6 +88,13 @@ dependencies = [
"pymdown-extensions>=10.0.0",
]

[project.optional-dependencies]
gpu = [
"flash-attn>=2.8.0.post2; sys_platform != 'darwin'",
"deepspeed; sys_platform != 'darwin'",
"vllm>=0.8.3; sys_platform != 'darwin'",
]

[tool.ruff]
line-length = 5000 # TODO: Reduce this to a more reasonable value

Expand Down