This paper collection of contrained decision making classifies the papers according to the types of constraints they deal with. In the same class, papers are sorted by time. In the future, more reasonable classification standards will be adopted.
Constrained Policy Optimization is closely related to safe exploration, which means providing certain degree of safety guarantee during exploration procedure. More about exploration in reinforcement learning can be found in RL-Exploration-Paper-Lists.
-
<Datasets and Benchmarks for Offline Safe Reinforcement Learning> by Zuxin Liu, Zijian Guo, Haohong Lin, Yihang Yao, Jiacheng Zhu, Zhepeng Cen, Hanjiang Hu, Wenhao Yu, Tingnan Zhang, Jie Tan, Ding Zhao, 2023.
-
<Trustworthy Reinforcement Learning Against Intrinsic Vulnerabilities: Robustness, Safety, and Generalizability> by Mengdi Xu, Zuxin Liu, Peide Huang, Wenhao Ding, Zhepeng Cen, Bo Li, and Ding Zhao. 2022.
-
<Constrained MDPs and the Reward Hypothesis> by Csaba Szepesvári, 2020.
-
<Exploration-Exploitation in Constrained MDPs> by Yonathan Efroni, Shie Mannor and Matteo Pirotta, 2020.
-
<Safety Gym> by Joshua Achiam, Alex Ray and Dario Amodei, 2019.
-
<Benchmarking Safe Exploration in Deep Reinforcement Learning> by Alex Ray, Joshua Achiam and Dario Amodei, 2019.
-
<A Comprehensive Survey on Safe Reinforcement Learning> by Javier Garc´ıa and Fernando Fern´andez, 2015.
This type of constraints includes the total expected cost until the state reaches some set M, the expected discounted cost, expected average cost, etc.
-
<Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning> by Yihang Yao, Zuxin Liu, Zhepeng Cen, Jiacheng Zhu, Wenhao Yu, Tingnan Zhang, Ding Zhao, 2023.
-
<Constrained Decision Transformer for Offline Safe Reinforcement Learning> by Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, Ding Zhao, 2023.
-
<Saute RL: Almost Surely Safe Reinforcement Learning Using State Augmentation> by Aivar Sootla, Alexander I. Cowen-Rivers, Taher Jafferjee, Ziyan Wang, David Mguni, Jun Wang, Haitham Bou-Ammar, 2022.
-
<Constrained Variational Policy Optimization for Safe Reinforcement Learning> by Zuxin Liu, Zhepeng Cen, Vladislav Isenbaev, Wei Liu, Zhiwei Steven Wu, Bo Li, Ding Zhao, 2022.
-
<CRPO: A New Approach for Safe Reinforcement Learning with Convergence Guarantee> by Tengyu Xu, Yingbin Lang and Guanghui Lan, 2021.
-
<Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies> by Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan and Peter J. Ramadge, 2021.
-
<First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning> by Yiming Zhang, Quan Vuong and Keith W. Ross, 2020.
-
<Responsive Safety in Reinforcement Learning by PID Lagrangian Methods> by Adam Stooke, Joshua Achiam and Pieter Abbeel, 2020.
-
<Safe Policy Learning for Continuous Control> by Yinlam Chow, Ofir Nachum, Aleksandra Faust, Edgar Duenez-Guzman and Mohammad Ghavamzadeh, 2020.
-
[PCPO] <Projection-Based Constrained Policy Optimization> by Tsung-Yen Yang, Justinian Rosca, Karthik Narasimhan and Peter J. Ramadge, 2020.
-
[RCPO] <Reward Constrained Policy Optimization> by Chen Tessler, Daniel J. Mankowitz and Shie Mannor, 2019.
-
<Value Constrained Model-free Continuous Control> by Steven Bohez, Abbas Abdolmaleki, Michael Neunert, Jonas Buchli, Nicolas Heess and Raia Hadsell, 2019.
-
<Convergent Policy Optimization for Safe Reinforcement Learning> by Ming Yu, Zhuoran Yang, Mladen Kolar and Zhaoran Wang, 2019.
-
<Safe Q-Learning Method Based on Constrained Markov Decision Processes> by Yangyang Ge, Fei Zhu, Xinghong Ling and Quan Liu, 2019.
-
<Batch Policy Learning under Constraints> by Hoang M. Le, Cameron Voloshin and Yisong Yue, 2019.
-
<Constrained Cross-Entropy Method for Safe Reinforcement Learning> by Min Wen and Ufuk Topcu, 2018.
-
<A Lyapunov-based Approach to Safe Reinforcement Learning> by Yinlam Chow, Ofir Nachum and Edgar Duenez-Guzman, 2018.
-
[CPO] <Constrained Policy Optimization> by Joshua Achiam, David Held, Aviv Tamar and Pieter Abbeel, 2017.
-
[CMDP] <Constrained Markov Decision Process> by Eitan ALTMAN, 1999.
This book establishes the framework of CMDPs and solve the optimization problem with known model by linear programming. But it does not give the solution for optimial policy in high dimensional control.
-
<Constrained Episodic Reinforcement Learning in Concave-convex and Knapsack Settings> by Kianté Brantley, Miroslav Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins and Wen Sun, 2020.
-
<Safe Exploration in Continuous Action Spaces> by Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin Paduraru and Yuval Tassa, 2018.
-
<Chance Constrained Policy Optimization for Process Control and Optimization> by Panagiotis Petsagkourakis, Ilya Orson Sandoval, Eric Bradford, Federico Galvanin, Dongda Zhang and Ehecatl Antonio del Rio-Chanona, 2020.
-
<Risk-Constrained Reinforcement Learning with Percentile Risk Criteria> by Yinlam Chow, Mohammad Ghavamzadeh, Lucas Janson and Marco Pavone, 2015.
-
<Inverse Constrained Reinforcement Learning> by Usman Anwar, Shehryar Malik, Alireza Aghasi and Ali Ahmed, 2021.
-
<Learning Parametric Constraints in High Dimensions from Demonstrations> by Glen Chou, Necmiye Ozay and Dmitry Berenson, 2020.
-
<Approaches to Safety in Inverse Reinforcement Learning> by Dexter R.R. Scobee, 2020.
-
<Counter-example Guided Learning of Bounds on Environment Behavior> by Yuxiao Chen, Sumanth Dathathri, Tung Phan-Minh and Richard M. Murray, 2020.
-
<Inferring Task Goals and Constraints using Bayesian Nonparametric Inverse Reinforcement Learning> by Daehyung Park, Michael Noseworthy, Rohan Paul, Subhro Roy and Nicholas Roy, 2020.
-
<Maximum Likelihood Constraint Inference for Inverse Reinforcement Learning> by Dexter R.R. Scobee and S. Shankar Sastry, 2019.
-
<Learning Constraints from Demonstrations> by Glen Chou, Dmitry Berenson and Necmiye Ozay, 2018.
-
<Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration> by Ritesh Noothigattu, Djallel Bouneffouf, Nicholas Mattei, Rachita Chandra, Piyush Madan, Kush Varshney, Murray Campbell, Moninder Singh and Francesca Rossi, 2018.
-
<Modeling Supervisor Safe Sets for Improving Collaboration in Human-robot Teams> by David L. McPherson, Dexter R.R. Scobee, Joseph Menke, Allen Y. Yang and S. Shankar Sastry, 2018.
-
<Learning Safe Policies with Expert Guidance> by Jessie Huang, Fa Wu, Doina Precup and Yang Cai, 2018.
-
<Inferring Geometric Constraints in Human Demonstrations> by Guru Subramani, Michael Zinn and Michael Gleicher, 2018.
-
<Learning Task Specifications from Demonstrations> by Marcell Vazquez-Chanlatte, Susmit Jha, Ashish Tiwari, Mark K. Ho and Sanjit A. Seshia, 2018.
-
<C-LEARN: Learning Geometric Constraints from Demonstrations for Multi-step Manipulation in Shared Autonomy> by Claudia P´erez-D’Arpino and Julie A. Shah, 2017.
-
<Deep Inverse Q-learning with Constraints> by Gabriel Kalweit, Maria Huegle, Moritz Werling and Joschka Boedecker, 2020.
-
<Simulating Emergent Properties of Human Driving Behavior Using Multi-agent Reward Augmented Imitation Learning> by Raunak P. Bhattacharyya, Derek J. Phillips, Changliu Liu, Jayesh K. Gupta, Katherine Driggs-Campbell and Mykel J. Kochenderfer, 2019.
-
<Infogail: Interpretable Imitation Learning from Visual Demonstrations> by Yunzhu Li, Jiaming Song and Stefano Ermon, 2017.
-
<Imitation Learning with Demonstrations and Shaping Rewards> by Kshitij Judah, Alan Fern, Prasad Tadepalli and Robby Goetschalckx, 2014.
-
<Imitation Learning with a Value-based Prior> by Umar Syed and Robert E. Schapire, 2012.
-
<Reinforcement Learning from Simultaneous Human and MDP Reward> by W. Bradley Knox and Peter Stone, 2012.