Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations
I designed new Robust Deep RL with a Soft Actor-Critic approach with adversarial perturbation on state observations. My work is based on SA-MDP, which is proposed by Zhang et al. (2020). For more detailed explanation, please check attached pdf file. **2022 Spring Semester, Personal Project Research _Kyungphil Park
SA-MDP assumes that the fixed-adversarial attack is the situation of the worst-case with the most minimized Q value following equations, and Zhang et al. (2020) newly define it as a SA-MDP. **Zhang et al. (2020)
In our work, we need to solve a minimax problem: minimizing the policy loss for a worst case
- object function
I designed Robust Deep RL with a soft actor critic approach in discrete action space. I tested SA-SAC in a several atari gym environments. SAC codes are based on the **bernomone's github codes.
At first, make new three directories saved_models, vidoes and Logs.
- Before you start training, set
n_steps,memory_size,train_start,reg_train_start… at theconfig01.jsonfile. n_steps: total nubmer of steps you want to train.memory_size: buffer memory sizetrain_start:number of steps when training begins.reg_train_start: number of steps when training with SA-Regularizer begins.
train.py
--config=config01.json(default)
--new=1(default) # set 0 when you load pretrained models
--game=BeamRider(default) # set any atari game environment - example:
python train.py,python [train.py](http://train.py) —game=Assault
robust_train.py
--config=config01.json(default)
--new=1(default) # set 0 when you load pretrained models
--game=BeamRider(default) # set any atari game environment - example:
python robust_train.py,python robust_[train.py](http://train.py) —game=Assault
- render atari game video with your trained models.
generate_match_video.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--random=False(default) # set 1 when you want to test random action.- example:
python generate_match_video.py,python generate_match_video[.py](http://train.py) —game=Assault --random=1
(+ PGD attack(adversarial perturbation on state observation)
- render atari game video with your trained models
PGD_generate_video.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--steps=10(default) # set PGD attack steps number.- example:
python PGD_generate_video.py,python PGD_generate_video[.py](http://train.py) —game=Assault
- test trained models for several episodes.
evalulation.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--iter=10(default) # set iteration number(tot episode number).- example:
python evalulation.py,python evalulation[.py](http://train.py) —game=Assault —iter=30
(+ PGD attack(adversarial perturbation on state observation)
- test trained models for several episodes.
pgd_evalulation.py
--config=config01.json(default)
--seed=0(default)
--game=BeamRider(default) # set any atari game environment
--iter=10(default) # set iteration number(tot episode number).- example:
python pgd_evalulation.py,python pgd_evalulation[.py](http://train.py) —game=Assault —iter=30




