Skip to content

TorchRL 0.10.0: async LLM inference

Latest
Compare
Choose a tag to compare
@vmoens vmoens released this 16 Sep 13:48
· 12 commits to main since this release

TorchRL 0.10.0 Release Notes

What's New in 0.10.0

TorchRL 0.10.0 introduces significant advancements in Large Language Model (LLM) support, new algorithms, enhanced environment integrations, and numerous performance improvements and bug fixes.

Major Features

LLM Support and RLHF

  • vLLM Integration Revamp: Complete overhaul of vLLM support with improved batching and performance (#3158) @vmoens
  • GRPO (Generalized Reinforcement Learning from Preference Optimization): New algorithm implementation with both sync and async variants (#2970, #2997, #3006) @vmoens
  • Expert Iteration and SFT: Implementation of expert iteration algorithms and supervised fine-tuning (#3017) @vmoens
  • PPOTrainer: New high-level trainer class for PPO training (#3117) @vmoens
  • LLM Tooling: Comprehensive tooling support for LLM environments and transformations (#2966) @vmoens
  • Remote LLM Wrappers: Support for remote LLM inference with improved batching (#3116) @vmoens
  • Common LLM Generation Interface: Unified kwargs for generation across vLLM and Transformers (#3107) @vmoens
  • LLM Transforms:
    • AddThinkingPrompt transform for reasoning prompts (#3027) @vmoens
    • MCPToolTransform for tool integration (#2993) @vmoens
    • PythonInterpreter transform for code execution (#2988) @vmoens
    • LLMMaskedCategorical for masked categorical distributions (#3041) @vmoens
  • Content Management: ContentBase system for structured content handling (#2985) @vmoens
  • History Tracking: New history system for conversation management (#2965) @vmoens

New Algorithms and Training

  • Async SAC: Asynchronous implementation of Soft Actor-Critic (#2946) @vmoens
  • Discrete Offline CQL: SOTA implementation for discrete action spaces (#3098) @Ibinarriaga
  • Multi-node Ray Support: Enhanced distributed training for GRPO (#3040) @albertbou92

Environment Support

  • NPU Support: Added NPU device support for SyncDataCollector (#3155) @lowdy1
  • IsaacLab Wrapper: Integration with IsaacLab simulation framework (#2937) @vmoens
  • Complete PettingZoo State Support: Enhanced multi-agent environment support (#2953) @JGuzzi
  • Minari Integration: Support for loading datasets from local Minari cache (#3068) @Ibinarriaga

Storage and Replay Buffers

  • Compressed Storage GPU: GPU acceleration for compressed replay buffers (#3062) @aorenstein68
  • Packing: New data packing functionality for efficient storage (#3060) @vmoens
  • Ray Replay Buffer: Enhanced distributed replay buffer support (#2949) @vmoens

🔧 Improvements and Enhancements

Performance Optimizations

  • Bounded Specs Memory: Single copy optimization for bounded specifications (#2977) @vmoens
  • Log-prob Computation: Avoid unnecessary log-prob calculations when retrieving distributions (#3081) @vmoens
  • LLM Wrapper Queuing: Performance fixes in LLM wrapper queuing (#3125) @vmoens
  • vmap Deactivation: Selective vmap deactivation in objectives for better performance (#2957) @vmoens

API Improvements

  • Public SAC Methods: Exposed public methods for SAC algorithm (#3085) @vmoens
  • Composite Entropy: Fixed entropy computation for nested keys (#3101) @juandelos
  • Multi-head Entropy: Per-head entropy coefficients for PPO (#2972) @Felixs
  • ClippedPPOLoss: Support for composite value networks (#3031) @louisfaury
  • LineariseRewards: Support for negative weights (#3064) @YoannPoupart
  • GAE Typing: Improved typing with optional value networks (#3029) @louisfaury
  • Explained Variance: Optional explained variance logging (#3010) @OswaldZink
  • Frame Control: Worker-level control over frames_per_batch (#3020) @alexghh

Developer Experience

  • Colored Logger: Enhanced logging with colored output (#2967) @vmoens
  • Better Error Handling: Improved error catching in env.rollout and rb.add (#3102) @vmoens
  • Warning Management: Better warning control for various components (#3099, #3115) @vmoens
  • Faster Tests: Optimized test suite performance (#3162) @vmoens

Bug Fixes

Core Functionality

Environment and Wrapper Fixes

  • TransformedEnv: Fixed in-place modification of specs (#3076) @vmoens
  • Parallel Environments: Fixed partial and nested done states (#2959) @vmoens
  • Gym Actions: Fixed single action passing when action key is not "action" (#2942) @vmoens
  • Brax Memory: Fixed memory leak in Brax environments (#3052) @vmoens
  • Atari Patching: Fixed patching for NonTensorData observations (#3091) @marcosGR

Collector and Replay Buffer Fixes

  • LLMCollector: Fixed trajectory collection when multiple trajectories complete (#3018) @albertbou92
  • Postprocessing: Consistent postprocessing when using replay buffers in collectors (#3144) @vmoens
  • Weight Updates: Fixed original weights retrieval in collectors (#2951) @vmoens
  • Transform Handling: Fixed transform application and metadata preservation (#3047, #3050) @vmoens

Compatibility and Infrastructure

  • PyTorch 2.1.1: Fixed compatibility issues (#3157) @vmoens
  • NPU Attribute: Fixed missing NPU attribute (#3159) @vmoens
  • CUDA Graph: Fixed update_policy_weights_ with CUDA graphs (#3003) @vmoens
  • Stream Capturing: Robust CUDA stream capturing calls (#2950) @vmoens

Documentation and Tutorials

  • DQN with RNN Tutorial: Upgraded tutorial with latest best practices (#3152) @vmoens
  • LLM API Documentation: Comprehensive documentation for LLM environments and transforms (#2991) @vmoens
  • Multi-head Entropy: Better documentation for multi-head entropy usage (#3109) @vmoens
  • LSTM Module: Fixed import examples in documentation (#3138) @arvindcr4
  • A2C Documentation: Updated AcceptedKeys documentation (#2987) @simeet-n
  • History API: Added missing docstrings for History functionality (#3083) @vmoens
  • Multi-agent PPO: Fixed tutorial issues (#2940) @matteobettini
  • WeightUpdater: Updated documentation after renaming (#3007) @albertbou92

Infrastructure and CI

  • Pre-commit Updates: Updated formatting and linting tools (#3108) @vmoens
  • Benchmark CI: Fixed benchmark runs and added missing dependencies (#3092, #3163) @vmoens
  • Windows CI: Fixed Windows continuous integration (#3028) @vmoens
  • Old Dependencies: Fixed CI for older dependency versions (#3165) @vmoens
  • C++ Linting: Fixed C++ code linting issues (#3129) @vmoens
  • Build System: Improved pyproject.toml usage and versioning (#3089, #3166) @vmoens

🏆 Contributors

Special thanks to all contributors who made this release possible:

  • @albertbou92 (Albert Bou) - GRPO multi-node support and LLM improvements
  • @Ibinarriaga - CQL offline algorithm and Minari integration
  • @aorenstein68 (Adrian Orenstein) - Compressed storage GPU support
  • @louisfaury (Louis Faury) - Categorical spec and PPO improvements
  • @LucaCarminati (Luca Carminati) - Binary tensor fixes
  • @JGuzzi (Jérôme Guzzi) - PettingZoo state support
  • @lowdy1 - NPU device support
  • @Felixs (Felix Sittenauer) - Multi-head entropy coefficients
  • @YoannPoupart (Yoann Poupart) - LineariseRewards improvements
  • @OswaldZink (Oswald Zink) - Explained variance logging
  • @alexghh (Alexandre Ghelfi) - Frame control improvements
  • @marcosGR (Marcos Galletero Romero) - Atari patching fixes
  • @matteobettini (Matteo Bettini) - Tutorial fixes
  • @simeet-n (Simeet Nayan) - Documentation improvements
  • @arvindcr4 - Documentation fixes
  • @felixy12 (Felix Yu) - State dict reference fixes
  • @SendhilPanchadsaram (Sendhil Panchadsaram) - Documentation typo fixes
  • @abhishekunique (Abhishek) - WandB logger and value estimation improvements
  • @骑马小猫 - DQN module typo fix
  • @ZainRizvi (Zain Rizvi) - CI improvements and meta-pytorch migration
  • @mikayla-gawarecki (Mikayla Gawarecki) - Usage tracking and ConditionalPolicySwitch

🔗 Compatibility

  • PyTorch: Compatible with PyTorch 2.1.1+ -- recommended >=2.8.0,<2.9.0 for full compatibility
  • TensorDict: Updated to work with TensorDict 0.10+
  • Python: Supports Python 3.9+

📦 Installation

pip install torchrl==0.10.0

For the latest features:

pip install git+https://github.com/pytorch/rl.git@release/0.10.0