Deep Reinforcement Learning Reimplementation

This is my final project for cse573: Artificial Intelligence. In this project, I reimplement 5 state-of-the-art algorithms (A2C, DDPG, PPO, TD3 and SAC) and carry out some experiments to study the effects of different aspects on the performance of models. This repo only serves for learning purpose and still has many difference from the published baseline. I borrow some ideas from sweetice's repo during implementation.

Basic Usage

For example, to train TD3 on Hopper-v2 environment for 2000 episode, simply use

python main.py --model TD3 --env_name Hopper-v2 --max_episode 2000

To evaluate the training result

python main.py --model TD3 --env_name Hopper-v2 --last_episode 2000 --mode eval

There are also many other options sepcified in the main.py file. For example, change the random seed to 10 and the capacity of replay buffer to 10000

python main.py --model TD3 --env_name Hopper-v2 --max_episode 2000 --seed 10 --capacity 10000

To visualize the training log

python plot_result.py --dir log/Hopper-v2/TD3

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
doc		doc
utils		utils
.gitignore		.gitignore
A2C.py		A2C.py
DDPG.py		DDPG.py
PPO.py		PPO.py
README.md		README.md
SAC.py		SAC.py
TD3.py		TD3.py
algorithms.py		algorithms.py
main.py		main.py
plot_result.py		plot_result.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning Reimplementation

Basic Usage

About

Releases

Packages

Languages

caipeide/Deep-Reinforcement-Learning-Reimplementation

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning Reimplementation

Basic Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages