Deep Reinforcement Learning: Zero to Hero!

Welcome to drlzh.ai: the most hands-on reinforcement learning experience!

This course is a deep dive into the vast and evolving world of Deep Reinforcement Learning, split into two parts. First, you'll learn the foundations of reinforcement learning and master the classics by building algorithms like DQN, SAC, and PPO from the ground up. Then, you'll venture into advanced topics such as curiosity-driven exploration, AlphaZero, and RLHF.

You'll learn by doing. This includes everything from playing Atari games, training robots, and landing on the Moon, to fine-tuning Language Models, implementing self-play with MCTS, and tackling cutting-edge challenges.

How it works

You'll progress through a series of hands-on Jupyter notebooks, implementing each algorithm from scratch in guided TODO: ... sections. If you get stuck, don't worry! The solution folder has all the completed notebooks for you to reference.

The entire experience is designed as a one-stop-shop within VS Code, with an opinionated setup so you can focus on learning, not on boilerplate.

Quick start

The easiest way to get started is with our Dockerized environment: a full-fledged, reproducible and ready-to-go development environment!

Install Docker and Git. Clone this repository and cd into it.
On Linux/macOS, run printf "UID=$(id -u)\nGID=$(id -g)\n" > .env to set user permissions.
Run docker compose up --build -d and wait for the container to startup. If poetry installation fails at any point, rerun the command (which will use cached content).
Open your browser to http://localhost:8080.
Inside VS Code, select the drl-env 3.12.11 Python environment.

Note on GPU: To enable GPU support (if you have a compatible NVIDIA card and drivers), use this command instead: docker compose -f docker-compose.yml -f docker-compose.gpu.yml up --build -d.

Chapters

Here is the content and overview of the course:

Introduction:An overview of the course and Reinforcement Learning (RL), its core concepts, and real-world applications.
Markov Decision Processes: The mathematical framework for modeling decision-making in uncertain environments.
Reinforcement Learning Foundations: Exploring the key components like agents, environments, states, actions, and rewards.
Deep Q Learning: A value-based algorithm that uses deep neural networks to learn optimal actions, famous for mastering Atari games.
Policy Gradient: Methods that directly optimize an agent's policy by learning a mapping from states to actions.
Actor Critic Methods: A hybrid approach combining policy-based (Actor) and value-based (Critic) methods for more stable learning.
Proximal Policy Optimization: An advanced and very popular policy gradient algorithm that improves training stability with clipped updates.
Bridge to Advanced Topics: A summary of fundamental concepts and a transition to more specialized areas of RL.
Exploration and Curiosity: Designing agents that can explore their environment effectively.
Multi-Agent Reinforcement Learning: Scenarios where multiple agents interact and learn.
Imitation Learning: Training agents by having them mimic expert behavior.
Monte Carlo Tree Search & AlphaZero: The powerful search algorithms behind game-playing champions.
Productionizing RL: Best practices for deploying RL systems in the real world.
Model Based Reinforcement Learning: Agents that learn a model of the world to plan ahead.
Reinforcement Learning with Human Feedback (RLHF): The technique used to align and fine-tune modern LLMs.
Conclusion: A final summary of the course and a look at the future of Reinforcement Learning.

Let's go!

Once your environment is up and running, open the 00_Intro.ipynb notebook to get started with some background, prerequisites, and more! Feel also free to take a peek at the notebooks directly on GitHub to get a sense of the experience.

Appendix

Manual Python Environment Setup

For advanced users who prefer a manual setup:

Install Miniconda (with Python 3.12).

Create and activate the environment:

conda create --name drlzh python=3.12
conda activate drlzh

Install Poetry (pip install poetry) and project dependencies (poetry install).

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.vscode		.vscode
assets		assets
solution		solution
util		util
.dockerignore		.dockerignore
.gitignore		.gitignore
00_Intro.ipynb		00_Intro.ipynb
01_MDPs.ipynb		01_MDPs.ipynb
02_RL.ipynb		02_RL.ipynb
03_DQN.ipynb		03_DQN.ipynb
04_PG.ipynb		04_PG.ipynb
05_AC.ipynb		05_AC.ipynb
06_PPO.ipynb		06_PPO.ipynb
07_Next.ipynb		07_Next.ipynb
08_EXPL.ipynb		08_EXPL.ipynb
09_MARL.ipynb		09_MARL.ipynb
10_IL.ipynb		10_IL.ipynb
11_MCTS.ipynb		11_MCTS.ipynb
12_PROD.ipynb		12_PROD.ipynb
13_MBRL.ipynb		13_MBRL.ipynb
14_RLHF.ipynb		14_RLHF.ipynb
15_EOF.ipynb		15_EOF.ipynb
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Deep Reinforcement Learning: Zero to Hero!

How it works

Quick start

Chapters

Let's go!

Appendix

Manual Python Environment Setup

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 4

Languages

License

alessiodm/drl-zh

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning: Zero to Hero!

How it works

Quick start

Chapters

Let's go!

Appendix

Manual Python Environment Setup

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 4

Languages

Packages