Submitted to ECC 2025.
-
results/
contains.csv
files for all results -
Safe-Policy-Optimization/
contains the code from this respository.- The algorithms we used are inside the
/safepo/single_agent/
directory. These arecpo.py
(CPO),ppo_ewc_cost.py
(Safe EWC), andppo_ewc.py
(PPO+EWC).ppo_ewc_lambda.py
is used for tuning the$\lambda$ hyperparameter.
- The algorithms we used are inside the
-
safety-gymnasium/
contains the code from this repository.- The continual RL environments we created that are used in the paper are in '/safety_gymnasium/tasks/safe_velocity/'. Specifically, the
safety_half_cheetah_valocity_v4.py
is the HalfCheetah nonstationary safety constrained task andsafety_ant_velocity_v2.py
is the Ant.
- The continual RL environments we created that are used in the paper are in '/safety_gymnasium/tasks/safe_velocity/'. Specifically, the
-
Analyze Results.ipynb
contains the analysis of the results. -
Lambda Experiment.ipynb
contains a hyperparameter experiment to choose EWC$\lambda$ . -
Environment Test
can be used to test the environments and visualize them.
- Enter the
/Safe-Policy-Optimization/safepo/single_agent/
directory. (e.g.,cd /Safe-Policy-Optimization/safepo/single_agent/
) - Train an agent by running the chosen algorithm as follows
-
python algorithm.py --task taskname -- experiment experiment_name
-
algorithm
is one ofcpo
,ppo_ewc
,ppo_ewc_cost
, orppo_ewc_lambda
. -
taskname
isSafeHalfCheetahVelocity-v4
orSafeAntVelocity-v2
. -
experiment
is your experiment name which will be saved in theruns/
folder. -
--ewc_lambda num
will set the value of$\lambda$ , the tradeoff between remembering previous tasks and learning on old tasks tonum
. -
--task-length num
is the number of environment observations for each nonstationary task. -
--tasks 'task_list'
is the task sequence. Ex: '[0, 1, 0, 1, 2, 0]'.
-
- For a comprehensive list of command line arguments, check the
single_agent_args()
function in this file.
-
The results of the paper can be reproduced by running the above commands for seeds 0-4 for each. As detailed in the paper, use ewc_lambda=10
, task-length=1_000_000
, task_list='[0, 1, 0, 2, 1, 0, 2]'
, and total-steps=8_000_000
. These results are saved in the results/
directory. Use Analyze Results.ipynb
to see our analysis.
- From project top directory
conda env create -f environment.yml
conda activate safe-continual
cd safety-gymnasium
pip install -e .
If you experience any issues, you may need to setup your own conda env and install safety gymnasium, then add packages as necessary. Alternatively, if your installation is not time-sensitive, please feel free to raise an issue!