TorchRL 0.10.0 Release Notes

What's New in 0.10.0

TorchRL 0.10.0 introduces significant advancements in Large Language Model (LLM) support, new algorithms, enhanced environment integrations, and numerous performance improvements and bug fixes.

Major Features

LLM Support and RLHF

vLLM Integration Revamp: Complete overhaul of vLLM support with improved batching and performance (#3158) @vmoens
GRPO (Generalized Reinforcement Learning from Preference Optimization): New algorithm implementation with both sync and async variants (#2970, #2997, #3006) @vmoens
Expert Iteration and SFT: Implementation of expert iteration algorithms and supervised fine-tuning (#3017) @vmoens
PPOTrainer: New high-level trainer class for PPO training (#3117) @vmoens
LLM Tooling: Comprehensive tooling support for LLM environments and transformations (#2966) @vmoens
Remote LLM Wrappers: Support for remote LLM inference with improved batching (#3116) @vmoens
Common LLM Generation Interface: Unified kwargs for generation across vLLM and Transformers (#3107) @vmoens
LLM Transforms:
- AddThinkingPrompt transform for reasoning prompts (#3027) @vmoens
- MCPToolTransform for tool integration (#2993) @vmoens
- PythonInterpreter transform for code execution (#2988) @vmoens
- LLMMaskedCategorical for masked categorical distributions (#3041) @vmoens
Content Management: ContentBase system for structured content handling (#2985) @vmoens
History Tracking: New history system for conversation management (#2965) @vmoens

New Algorithms and Training

Async SAC: Asynchronous implementation of Soft Actor-Critic (#2946) @vmoens
Discrete Offline CQL: SOTA implementation for discrete action spaces (#3098) @Ibinarriaga
Multi-node Ray Support: Enhanced distributed training for GRPO (#3040) @albertbou92

Environment Support

NPU Support: Added NPU device support for SyncDataCollector (#3155) @lowdy1
IsaacLab Wrapper: Integration with IsaacLab simulation framework (#2937) @vmoens
Complete PettingZoo State Support: Enhanced multi-agent environment support (#2953) @JGuzzi
Minari Integration: Support for loading datasets from local Minari cache (#3068) @Ibinarriaga

Storage and Replay Buffers

Compressed Storage GPU: GPU acceleration for compressed replay buffers (#3062) @aorenstein68
Packing: New data packing functionality for efficient storage (#3060) @vmoens
Ray Replay Buffer: Enhanced distributed replay buffer support (#2949) @vmoens

🔧 Improvements and Enhancements

Performance Optimizations

Bounded Specs Memory: Single copy optimization for bounded specifications (#2977) @vmoens
Log-prob Computation: Avoid unnecessary log-prob calculations when retrieving distributions (#3081) @vmoens
LLM Wrapper Queuing: Performance fixes in LLM wrapper queuing (#3125) @vmoens
vmap Deactivation: Selective vmap deactivation in objectives for better performance (#2957) @vmoens

API Improvements

Public SAC Methods: Exposed public methods for SAC algorithm (#3085) @vmoens
Composite Entropy: Fixed entropy computation for nested keys (#3101) @juandelos
Multi-head Entropy: Per-head entropy coefficients for PPO (#2972) @Felixs
ClippedPPOLoss: Support for composite value networks (#3031) @louisfaury
LineariseRewards: Support for negative weights (#3064) @YoannPoupart
GAE Typing: Improved typing with optional value networks (#3029) @louisfaury
Explained Variance: Optional explained variance logging (#3010) @OswaldZink
Frame Control: Worker-level control over frames_per_batch (#3020) @alexghh

Developer Experience

Colored Logger: Enhanced logging with colored output (#2967) @vmoens
Better Error Handling: Improved error catching in env.rollout and rb.add (#3102) @vmoens
Warning Management: Better warning control for various components (#3099, #3115) @vmoens
Faster Tests: Optimized test suite performance (#3162) @vmoens

Bug Fixes

Core Functionality

PRB Serialization: Fixed Prioritized Replay Buffer serialization and loading (#3151, #2963) @vmoens
Binary Operations: Fixed Binary tensor reshaping and clone operations (#3084, #3077) @LucaCarminati @vmoens
Categorical Spec: Fixed dtype sampling and masking issues (#2980, #2981) @louisfaury
ActionMask: Compatibility with composite action specifications (#3022) @louisfaury
GAE with LSTM: Fixed shifted value computation with LSTM networks (#2941) @vmoens
Cross-entropy: Fixed log-prob computation for batched input (#3080) @vmoens

Environment and Wrapper Fixes

TransformedEnv: Fixed in-place modification of specs (#3076) @vmoens
Parallel Environments: Fixed partial and nested done states (#2959) @vmoens
Gym Actions: Fixed single action passing when action key is not "action" (#2942) @vmoens
Brax Memory: Fixed memory leak in Brax environments (#3052) @vmoens
Atari Patching: Fixed patching for NonTensorData observations (#3091) @marcosGR

Collector and Replay Buffer Fixes

LLMCollector: Fixed trajectory collection when multiple trajectories complete (#3018) @albertbou92
Postprocessing: Consistent postprocessing when using replay buffers in collectors (#3144) @vmoens
Weight Updates: Fixed original weights retrieval in collectors (#2951) @vmoens
Transform Handling: Fixed transform application and metadata preservation (#3047, #3050) @vmoens

Compatibility and Infrastructure

PyTorch 2.1.1: Fixed compatibility issues (#3157) @vmoens
NPU Attribute: Fixed missing NPU attribute (#3159) @vmoens
CUDA Graph: Fixed update_policy_weights_ with CUDA graphs (#3003) @vmoens
Stream Capturing: Robust CUDA stream capturing calls (#2950) @vmoens

Documentation and Tutorials

DQN with RNN Tutorial: Upgraded tutorial with latest best practices (#3152) @vmoens
LLM API Documentation: Comprehensive documentation for LLM environments and transforms (#2991) @vmoens
Multi-head Entropy: Better documentation for multi-head entropy usage (#3109) @vmoens
LSTM Module: Fixed import examples in documentation (#3138) @arvindcr4
A2C Documentation: Updated AcceptedKeys documentation (#2987) @simeet-n
History API: Added missing docstrings for History functionality (#3083) @vmoens
Multi-agent PPO: Fixed tutorial issues (#2940) @matteobettini
WeightUpdater: Updated documentation after renaming (#3007) @albertbou92

Infrastructure and CI

Pre-commit Updates: Updated formatting and linting tools (#3108) @vmoens
Benchmark CI: Fixed benchmark runs and added missing dependencies (#3092, #3163) @vmoens
Windows CI: Fixed Windows continuous integration (#3028) @vmoens
Old Dependencies: Fixed CI for older dependency versions (#3165) @vmoens
C++ Linting: Fixed C++ code linting issues (#3129) @vmoens
Build System: Improved pyproject.toml usage and versioning (#3089, #3166) @vmoens

🏆 Contributors

Special thanks to all contributors who made this release possible:

@albertbou92 (Albert Bou) - GRPO multi-node support and LLM improvements
@Ibinarriaga - CQL offline algorithm and Minari integration
@aorenstein68 (Adrian Orenstein) - Compressed storage GPU support
@louisfaury (Louis Faury) - Categorical spec and PPO improvements
@LucaCarminati (Luca Carminati) - Binary tensor fixes
@JGuzzi (Jérôme Guzzi) - PettingZoo state support
@lowdy1 - NPU device support
@Felixs (Felix Sittenauer) - Multi-head entropy coefficients
@YoannPoupart (Yoann Poupart) - LineariseRewards improvements
@OswaldZink (Oswald Zink) - Explained variance logging
@alexghh (Alexandre Ghelfi) - Frame control improvements
@marcosGR (Marcos Galletero Romero) - Atari patching fixes
@matteobettini (Matteo Bettini) - Tutorial fixes
@simeet-n (Simeet Nayan) - Documentation improvements
@arvindcr4 - Documentation fixes
@felixy12 (Felix Yu) - State dict reference fixes
@SendhilPanchadsaram (Sendhil Panchadsaram) - Documentation typo fixes
@abhishekunique (Abhishek) - WandB logger and value estimation improvements
@骑马小猫 - DQN module typo fix
@ZainRizvi (Zain Rizvi) - CI improvements and meta-pytorch migration
@mikayla-gawarecki (Mikayla Gawarecki) - Usage tracking and ConditionalPolicySwitch

🔗 Compatibility

PyTorch: Compatible with PyTorch 2.1.1+ -- recommended >=2.8.0,<2.9.0 for full compatibility
TensorDict: Updated to work with TensorDict 0.10+
Python: Supports Python 3.9+

📦 Installation

pip install torchrl==0.10.0

For the latest features:

pip install git+https://github.com/pytorch/rl.git@release/0.10.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TorchRL 0.10.0: async LLM inference