This is my final project for cse573: Artificial Intelligence. In this project, I reimplement 5 state-of-the-art algorithms (A2C, DDPG, PPO, TD3 and SAC) and carry out some experiments to study the effects of different aspects on the performance of models. This repo only serves for learning purpose and still has many difference from the published baseline. I borrow some ideas from sweetice's repo during implementation.
For example, to train TD3 on Hopper-v2 environment for 2000 episode, simply use
python --model TD3 --env_name Hopper-v2 --max_episode 2000
To evaluate the training result
python --model TD3 --env_name Hopper-v2 --last_episode 2000 --mode eval
There are also many other options sepcified in the
file. For example, change the random seed to 10 and the capacity of replay buffer to 10000
python --model TD3 --env_name Hopper-v2 --max_episode 2000 --seed 10 --capacity 10000
To visualize the training log
python --dir log/Hopper-v2/TD3