Skip to content

This repo contains the official implementation of Constrained Reinforcement Learning with Smoothed Log Barrier Function.

License

Notifications You must be signed in to change notification settings

nrgrp/saferl-lib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SafeRL-Lib

A comprehensive toolkit for Constrained Reinforcement Learning and Safe Reinforcement Learning research, built on top of Stable-Baselines3 and Safety-Gymnasium.

Features

  • Multiple Safe RL Algorithms: Implementation of state-of-the-art safe RL algorithms including:
    • CSAC-LB (Constrained Soft Actor-Critic with Log Barrier) - TMLR 2025 ⭐
    • SAC-Lag (Soft Actor-Critic with Lagrangian constraints) - RSS 2020
    • CPO (Constrained Policy Optimization) - ICML 2017
    • WCSAC (Worst-Case Soft Actor-Critic) - AAAI 2021
    • APPO (Augmented Proximal Policy Optimization) - AAAI 2023
    • SAC (Soft Actor-Critic) with Reward Shaping - Modified from ICML 2018

Installation

Prerequisites

  • Python 3.10
  • CUDA-compatible GPU (recommended)

Step 1: Clone the Repository

git clone git@github.com:2BH/saferl-lib.git
cd saferl-lib

Step 2: Create Conda Environment (Optional)

conda env create -n saferl python=3.10
conda activate saferl-lib

Step 3: Install Safety-Gymnasium and other dependencies

pip install stable_baselines3==2.7.0
pip install sb3_contrib==2.7.0
cd ~/
wget https://github.com/PKU-Alignment/safety-gymnasium/archive/refs/heads/main.zip
unzip main.zip
cd safety-gymnasium-main
pip install -e .
pip install gymnasium==0.29.1
pip install hydra-core==1.3.2
pip install tensorboard==2.20.0

🌟 Featured Algorithm: CSAC-LB

Constrained Soft Actor-Critic with Log Barrier - TMLR 2025

CSAC-LB is our algorithm that addresses key challenges in constrained reinforcement learning:

  • Numerical Stability: Uses a linear smoothed log barrier function that provides non-vanishing gradients
  • Quick Recovery: Enables agents to quickly recover from unsafe states during training
  • Enhanced Safety: Employs a pessimistic double-critic architecture to mitigate constraint violation underestimation

Key Innovation: The integration of a smoothed log barrier function into the actor's objective provides a numerically stable alternative to traditional interior-point methods, making it particularly suitable for safety-critical applications.

image

# Quick CSAC-LB example
from saferl import CSAC_LB, create_env

env = create_env(env_cfg, seed=42)
model = CSAC_LB("MlpPolicy", env, cost_constraint=[5.0], lower_bound=0.1)
model.learn(total_timesteps=100000)

Quick Start

Basic Usage

# Activate environment
conda activate saferl

# Run CSAC-LB experiment (recommended)
python -m saferl.examples.main algorithm=csac_lb

Reproducing the Results

conda activate saferl
# For Crabs environments, env=CrabsMove/CrabsSwing/CrabsTilt/CrabsUpright
python -m saferl.examples.main env=CrabsMove norm_obs=False eval_freq=1000
# For other environments, i.e. env=SafetyAntVelocity/SafetyHumanoidVelocity/SafetyWalker2DVelocity/SafetyHalfCheetahVelocity/SafetyHopperVelocity/SafetyCarircle1
python -m saferl.examples.main env=SafetyAntVelocity norm_obs=True eval_freq=100000

Configuration System

The library uses Hydra for configuration management. Configurations are organized as follows:

saferl/examples/configs/
β”œβ”€β”€ algorithm/          # Algorithm-specific configurations
β”œβ”€β”€ env/               # Environment-specific configurations
β”œβ”€β”€ callback/          # Callback configurations
└── main.yaml         # Main configuration file

Customizing Experiments

You can override any configuration parameter:

# Change learning rate
python -m saferl.examples.main algorithm=sac_lag algorithm.model.learning_rate=1e-4

# Change environment parameters
python -m saferl.examples.main env=SafetyAntVelocity env.train_env.env_kwargs.camera_id=1

# Run with different seeds
python -m saferl.examples.main seed=42

Examples

Training with CSAC-LB (Recommended)

from saferl import CSAC_LB, create_env

# Create environment
env = create_env(env_cfg, seed=42)

# Create and train CSAC-LB model
model = CSAC_LB("MlpPolicy", env, cost_constraint=[5.0], lower_bound=0.1)
model.learn(total_timesteps=100000)

# Save model
model.save("csac_lb_agent")

Training with SAC-Lag

from saferl import SAC_LAG, create_env

# Create environment
env = create_env(env_cfg, seed=42)

# Create and train SAC-Lag model
model = SAC_LAG("MlpPolicy", env, cost_constraint=[5.0])
model.learn(total_timesteps=100000)

# Save model
model.save("sac_lag_agent")

Evaluation

from saferl.common.utils import evaluate

# Evaluate the trained model
results = evaluate(model, env, num_episodes=10)
print(f"Average return: {results['ret']}")
print(f"Average cost: {results['cost']}")
print(f"Safety rate: {results['is_safe']}")

Documentation

For detailed API documentation and examples, please refer to the individual algorithm modules and the saferl.common utilities.

Citation

If you use this library in your research or use our algorithm, please cite our work:

@article{zhang2025constrained,
  title={Constrained Reinforcement Learning with Smoothed Log Barrier Function},
  author={B. Zhang and Y. Zhang and H. Zhu and S. Yan and T. Brox and J. Boedecker},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2025},
  url={https://openreview.net/forum?id=Amh95oURaE},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

image

About

This repo contains the official implementation of Constrained Reinforcement Learning with Smoothed Log Barrier Function.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages