rllm-org · jeffreysijuntan · Aug 8, 2025 · Aug 7, 2025 · Aug 8, 2025
diff --git a/README.md b/README.md
@@ -21,13 +21,12 @@
 
 </div>
 
-rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads. 
+rLLM is an open-source framework for post-training language agents via reinforcement learning. With rLLM, you can easily build your custom agents and environments, train them with reinforcement learning, and deploy them for real-world workloads.
 
+## Releases 📰
 
-## Releases  📰
+<strong>[2025/07/01]</strong> We release [`DeepSWE-Preview`](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73), a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models.
 
-<strong>[2025/07/01]</strong> We release [`DeepSWE-Preview`](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73
-), a 32B software engineering agent (SWE) trained with purely RL that achieves 59% on SWEBench-Verified with test-time scaling,(42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 
 - 🍽️ An In-Depth Blog Post on our [SWE Agents and RL Training Recipes](https://pretty-radio-b75.notion.site/DeepSWE-Training-a-Fully-Open-sourced-State-of-the-Art[…]-by-Scaling-RL-22281902c1468193aabbe9a8c59bbe33?pvs=73)
 - 🤗 HF Model [`DeepSWE-Preview`](https://huggingface.co/agentica-org/DeepSWE-Preview)
 - 🤗 HF Dataset [`R2E-Gym-Subset`](https://huggingface.co/datasets/R2E-Gym/R2E-Gym-Subset)
@@ -36,10 +35,10 @@ rLLM is an open-source framework for post-training language agents via reinforce
 - 🔎 [Evaluation Logs](https://drive.google.com/file/d/10LIwpJeaFuiX6Y-qEG2a4a335PEuQJeS/view?usp=sharing)—16 passes over SWE-Bench-Verified.
 
 <strong>[2025/04/08]</strong> We release [`DeepCoder-14B-Preview`](https://pretty-radio-b75.notion.site/DeepCoder-A-Fully-Open-Source-14B-Coder-at-O3-mini-Level-1cf81902c14680b3bee5eb349a512a51), a 14B coding model that achieves an impressive **60.6%** Pass@1 accuracy on LiveCodeBench (+8% improvement), matching the performance of `o3-mini-2025-01-031 (Low)` and `o1-2024-12-17`. 
-
 <strong>[2025/02/10]</strong> We release [`DeepScaleR-1.5B-Preview`](https://pretty-radio-b75.notion.site/DeepScaleR-Surpassing-O1-Preview-with-a-1-5B-Model-by-Scaling-RL-19681902c1468005bed8ca303013a4e2), a 1.5B model that surpasses O1-Preview and achieves <strong>43.1% Pass@1</strong> on AIME. We achieve this by iteratively scaling Deepseek's GRPO algorithm from 8K→16K->24K context length for thinking.
 
 ## Getting Started 🎯
+
 ### Installation
 
 ```bash
@@ -54,6 +53,8 @@ conda activate rllm
 # Install all dependencies
 pip install -e ./verl
 pip install -e .
+
+**Note:** On macOS, GPU features (flash-attn, deepspeed, vllm) are automatically excluded for compatibility. For GPU support on macOS, you can install with: `pip install -e .[gpu]`
 ```
 
 ### Installation with Docker 🐳
@@ -73,16 +74,16 @@ docker start rllm-container
 docker exec -it rllm-container bash
 ```
 
-
 ## Acknowledgements
 
 - Our training experiments are powered by our heavily modified fork of [verl](https://github.com/volcengine/verl), an open-source RLHF library.
 - Our models are trained on top of [`DeepSeek-R1-Distill-Qwen-1.5B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), [`DeepSeek-R1-Distill-Qwen-14B`](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B), and [`Qwen3-32B`](https://huggingface.co/Qwen/Qwen3-32b).
-- Our work is done as part of  [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/), [Berkeley AI Research](https://bair.berkeley.edu/), and a successful collaboration with Together AI.
-
+- Our work is done as part of [Berkeley Sky Computing Lab](https://skycomputing.berkeley.edu/), [Berkeley AI Research](https://bair.berkeley.edu/), and a successful collaboration with Together AI.
 
 ## Citation
+
 Citing rLLM:
+
 ```bibtex
 @misc{rllm2025,
   title={rLLM: A Framework for Post-Training Language Agents},
@@ -95,6 +96,7 @@ Citing rLLM:
 ```
 
 Citing DeepSWE:
+
 ```bibtex
 @misc{deepswe2025,
   title={DeepSWE: Training a State-of-the-Art Coding Agent from Scratch by Scaling RL},
@@ -106,6 +108,7 @@ Citing DeepSWE:
 ```
 
 Citing DeepCoder:
+
 ```bibtex
 @misc{deepcoder2025,
   title={DeepCoder: A Fully Open-Source 14B Coder at O3-mini Level},
@@ -117,6 +120,7 @@ Citing DeepCoder:
 ```
 
 Citing DeepScaleR:
+
 ```bibtex
 @misc{deepscaler2025,
   title={DeepScaleR: Surpassing O1-Preview with a 1.5B Model by Scaling RL},

diff --git a/pyproject.toml b/pyproject.toml
@@ -22,14 +22,14 @@ dependencies = [
     "torch>=2.7",
     "transformers",
     "accelerate",
-    "flash-attn>=2.8.0.post2",
+    "flash-attn>=2.8.0.post2; sys_platform != 'darwin'",  # Skip on macOS
     "sentence-transformers",
     "torchmetrics",
 
     # Training and inference
-    "deepspeed",
-    "vllm>=0.8.3",
-    "sgl-kernel>=0.2.0",
+    "deepspeed; sys_platform != 'darwin'",  # Skip on macOS
+    "vllm>=0.8.3; sys_platform != 'darwin'",  # Skip on macOS
+    "sgl-kernel",
     "sglang>=0.4.8.post1",
     "sglang-router",
     "peft",
@@ -88,6 +88,13 @@ dependencies = [
     "pymdown-extensions>=10.0.0",
 ]
 
+[project.optional-dependencies]
+gpu = [
+    "flash-attn>=2.8.0.post2; sys_platform != 'darwin'",
+    "deepspeed; sys_platform != 'darwin'",
+    "vllm>=0.8.3; sys_platform != 'darwin'",
+]
+
 [tool.ruff]
 line-length = 5000 # TODO: Reduce this to a more reasonable value