SafeRL-Lib

A comprehensive toolkit for Constrained Reinforcement Learning and Safe Reinforcement Learning research, built on top of Stable-Baselines3 and Safety-Gymnasium.

Features

Multiple Safe RL Algorithms: Implementation of state-of-the-art safe RL algorithms including:
- CSAC-LB (Constrained Soft Actor-Critic with Log Barrier) - TMLR 2025 ⭐
- SAC-Lag (Soft Actor-Critic with Lagrangian constraints) - RSS 2020
- CPO (Constrained Policy Optimization) - ICML 2017
- WCSAC (Worst-Case Soft Actor-Critic) - AAAI 2021
- APPO (Augmented Proximal Policy Optimization) - AAAI 2023
- SAC (Soft Actor-Critic) with Reward Shaping - Modified from ICML 2018

Installation

Prerequisites

Python 3.10
CUDA-compatible GPU (recommended)

Step 1: Clone the Repository

git clone git@github.com:2BH/saferl-lib.git
cd saferl-lib

Step 2: Create Conda Environment (Optional)

conda env create -n saferl python=3.10
conda activate saferl-lib

Step 3: Install Safety-Gymnasium and other dependencies

pip install stable_baselines3==2.7.0
pip install sb3_contrib==2.7.0
cd ~/
wget https://github.com/PKU-Alignment/safety-gymnasium/archive/refs/heads/main.zip
unzip main.zip
cd safety-gymnasium-main
pip install -e .
pip install gymnasium==0.29.1
pip install hydra-core==1.3.2
pip install tensorboard==2.20.0

🌟 Featured Algorithm: CSAC-LB

Constrained Soft Actor-Critic with Log Barrier - TMLR 2025

CSAC-LB is our algorithm that addresses key challenges in constrained reinforcement learning:

Numerical Stability: Uses a linear smoothed log barrier function that provides non-vanishing gradients
Quick Recovery: Enables agents to quickly recover from unsafe states during training
Enhanced Safety: Employs a pessimistic double-critic architecture to mitigate constraint violation underestimation

Key Innovation: The integration of a smoothed log barrier function into the actor's objective provides a numerically stable alternative to traditional interior-point methods, making it particularly suitable for safety-critical applications.

# Quick CSAC-LB example
from saferl import CSAC_LB, create_env

env = create_env(env_cfg, seed=42)
model = CSAC_LB("MlpPolicy", env, cost_constraint=[5.0], lower_bound=0.1)
model.learn(total_timesteps=100000)

Quick Start

Basic Usage

# Activate environment
conda activate saferl

# Run CSAC-LB experiment (recommended)
python -m saferl.examples.main algorithm=csac_lb

Reproducing the Results

conda activate saferl
# For Crabs environments, env=CrabsMove/CrabsSwing/CrabsTilt/CrabsUpright
python -m saferl.examples.main env=CrabsMove norm_obs=False eval_freq=1000
# For other environments, i.e. env=SafetyAntVelocity/SafetyHumanoidVelocity/SafetyWalker2DVelocity/SafetyHalfCheetahVelocity/SafetyHopperVelocity/SafetyCarircle1
python -m saferl.examples.main env=SafetyAntVelocity norm_obs=True eval_freq=100000

Configuration System

The library uses Hydra for configuration management. Configurations are organized as follows:

saferl/examples/configs/
├── algorithm/          # Algorithm-specific configurations
├── env/               # Environment-specific configurations
├── callback/          # Callback configurations
└── main.yaml         # Main configuration file

Customizing Experiments

You can override any configuration parameter:

# Change learning rate
python -m saferl.examples.main algorithm=sac_lag algorithm.model.learning_rate=1e-4

# Change environment parameters
python -m saferl.examples.main env=SafetyAntVelocity env.train_env.env_kwargs.camera_id=1

# Run with different seeds
python -m saferl.examples.main seed=42

Examples

Training with CSAC-LB (Recommended)

from saferl import CSAC_LB, create_env

# Create environment
env = create_env(env_cfg, seed=42)

# Create and train CSAC-LB model
model = CSAC_LB("MlpPolicy", env, cost_constraint=[5.0], lower_bound=0.1)
model.learn(total_timesteps=100000)

# Save model
model.save("csac_lb_agent")

Training with SAC-Lag

from saferl import SAC_LAG, create_env

# Create environment
env = create_env(env_cfg, seed=42)

# Create and train SAC-Lag model
model = SAC_LAG("MlpPolicy", env, cost_constraint=[5.0])
model.learn(total_timesteps=100000)

# Save model
model.save("sac_lag_agent")

Evaluation

from saferl.common.utils import evaluate

# Evaluate the trained model
results = evaluate(model, env, num_episodes=10)
print(f"Average return: {results['ret']}")
print(f"Average cost: {results['cost']}")
print(f"Safety rate: {results['is_safe']}")

Documentation

For detailed API documentation and examples, please refer to the individual algorithm modules and the saferl.common utilities.

Citation

If you use this library in your research or use our algorithm, please cite our work:

@article{zhang2025constrained,
  title={Constrained Reinforcement Learning with Smoothed Log Barrier Function},
  author={B. Zhang and Y. Zhang and H. Zhu and S. Yan and T. Brox and J. Boedecker},
  journal={Transactions on Machine Learning Research},
  issn={2835-8856},
  year={2025},
  url={https://openreview.net/forum?id=Amh95oURaE},
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on top of Stable-Baselines3
Environment support from Safety-Gymnasium
Configuration management with Hydra

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
saferl		saferl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SafeRL-Lib

Features

Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Create Conda Environment (Optional)

Step 3: Install Safety-Gymnasium and other dependencies

🌟 Featured Algorithm: CSAC-LB

Quick Start

Basic Usage

Reproducing the Results

Configuration System

Customizing Experiments

Examples

Training with CSAC-LB (Recommended)

Training with SAC-Lag

Evaluation

Documentation

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

nrgrp/saferl-lib

Folders and files

Latest commit

History

Repository files navigation

SafeRL-Lib

Features

Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Create Conda Environment (Optional)

Step 3: Install Safety-Gymnasium and other dependencies

🌟 Featured Algorithm: CSAC-LB

Quick Start

Basic Usage

Reproducing the Results

Configuration System

Customizing Experiments

Examples

Training with CSAC-LB (Recommended)

Training with SAC-Lag

Evaluation

Documentation

Citation

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages