A comprehensive repository for learning Reinforcement Learning through theory, algorithms, and hands-on practice.
- ๐ง Reinforcement Learning
Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment. Unlike supervised learning, RL doesn't rely on labeled data. Instead, the agent learns through trial and error, receiving rewards or penalties for its actions.
- ๐ค Robotics: Robot navigation, manipulation, and control
- ๐ฎ Gaming: AlphaGo, OpenAI Five, StarCraft II agents
- ๐ Autonomous Vehicles: Path planning and decision making
- ๐ฐ Finance: Algorithmic trading and portfolio management
- ๐ฏ Recommendation Systems: Personalized content delivery
- โก Energy: Smart grid optimization and resource allocation
| Component | Description |
|---|---|
| ๐ค Agent | The decision-maker that learns and takes actions |
| ๐ Environment | The world the agent interacts with (MDPs, Gym environments) |
| ๐ Reward Function | Feedback signal that guides learning (positive/negative) |
| ๐ฏ Policy | The agent's strategy for choosing actions (deterministic/stochastic) |
| ๐ Value Function | Estimates expected future rewards from states/actions |
| ๐ Exploration vs Exploitation | Balance between trying new actions and using known good ones |
| ๐ง Training Algorithms | Methods to improve the policy (Q-learning, policy gradients, etc.) |
| Aspect | Reinforcement Learning | Supervised Learning |
|---|---|---|
| ๐ Feedback Type | Delayed rewards/penalties | Immediate labels |
| ๐ Data Requirements | Sequential interaction data | Static labeled datasets |
| ๐ฏ Training Objective | Maximize cumulative reward | Minimize prediction error |
| ๐ค Output | Policies (action strategies) | Predictions/classifications |
| ๐ Learning Style | Trial and error | Pattern recognition |
| โฐ Temporal Aspect | Sequential decision making | Independent predictions |
- ๐ฐ Multi-Armed Bandits
- ๐ Markov Decision Processes (MDPs)
- โก Dynamic Programming (Value & Policy Iteration)
- ๐ฒ Monte Carlo Methods
- โฐ Temporal Difference Learning (TD)
- ๐ฏ Q-Learning & SARSA
- ๐ช Expected SARSA & Double Q-Learning
- ๐ Function Approximation
- ๐ง Deep Q-Networks (DQN)
- ๐ง DQN Variants (Double DQN, Dueling DQN)
- ๐ญ Policy Gradient Methods (REINFORCE)
- ๐ช Actor-Critic Methods (A2C, A3C)
- ๐ Proximal Policy Optimization (PPO)
- ๐ Deep Deterministic Policy Gradient (DDPG)
- ๐ฏ Soft Actor-Critic (SAC)
- ๐ค Multi-Agent Reinforcement Learning
- ๐๏ธ Hierarchical Reinforcement Learning
- Python 3.13+
- Git installed
- UV package manager (recommended)
- (Optional) CUDA-compatible GPU for deep RL training
# Clone the repository
git clone https://github.com/mohd-faizy/Reinforcement_learning.git
cd Reinforcement_learning
# Using UV (recommended)
uv venv rl_env
source rl_env/bin/activate # macOS/Linux
.\rl_env\Scripts\activate # Windows
uv pip install -r requirements.txt
# Or using pip
pip install -r requirements.txt
# Launch Jupyter Lab to explore notebooks
jupyter labReinforcement_learning/
โโโ ๐ _img/ # Images and visualizations
โ โโโ rl-map.png # Learning roadmap
โ โโโ frozen-lake.png # Environment diagrams
โ โโโ ...
โโโ ๐ 00_RL/ # Basic RL implementations
โ โโโ 02_frozen_lake.py # FrozenLake environment
โ โโโ 03_mountain_car.py # MountainCar environment
โ โโโ 04_taxi-parking.py # Taxi environment
โ โโโ 05_cliffwalking.py # CliffWalking environment
โโโ ๐ 01_Q-Learning/ # Q-Learning implementations
โ โโโ 00_Q-Learning.ipynb # Q-Learning tutorial
โ โโโ 02_cartpole_Q.py # CartPole with Q-Learning
โ โโโ 03_frozen_lake_Q.py # FrozenLake with Q-Learning
โโโ ๐ 02_DQN/ # Deep Q-Network implementations
โ โโโ 00_cartpole_DQN.py # CartPole with DQN
โ โโโ 01_mountain_car_DQN.py # MountainCar with DQN
โโโ ๐ 00_RL_intro.ipynb # Introduction to RL
โโโ ๐ 01_Markov_Decision_Processes.ipynb
โโโ ๐ 02_State_&_Action_value.ipynb
โโโ ๐ 03_Policy_&_Value_Iteration.ipynb
โโโ ๐ 05_Monte_Carlo_Methods.ipynb
โโโ ๐ 06_Temporal_Difference_Learning.ipynb
โโโ ๐ _Q_vs_DQN.ipynb # Comparison of Q-Learning vs DQN
โโโ ๐ requirements.txt # Python dependencies
โโโ ๐ pyproject.toml # Project configuration
โโโ ๐ README.md # This file
| Algorithm | Implementation | Environment | Status |
|---|---|---|---|
| ๐ฐ Multi-Armed Bandit | 00_RL/ |
Custom Bandits | โ |
| ๐ฏ Q-Learning | 01_Q-Learning/ |
FrozenLake, CartPole | โ |
| ๐ง Deep Q-Network (DQN) | 02_DQN/ |
CartPole, MountainCar | โ |
| ๐ Value Iteration | 03_Policy_&_Value_Iteration.ipynb |
GridWorld | โ |
| ๐ฒ Monte Carlo | 05_Monte_Carlo_Methods.ipynb |
Blackjack | โ |
| โฐ TD Learning | 06_Temporal_Difference_Learning.ipynb |
Various | โ |
Start your RL journey with these comprehensive notebooks:
- 00_RL_intro.ipynb - Fundamentals of RL
- 01_Markov_Decision_Processes.ipynb - MDPs and Bellman equations
- 02_State_&_Action_value.ipynb - Value functions
- 03_Policy_&_Value_Iteration.ipynb - Dynamic programming
- 05_Monte_Carlo_Methods.ipynb - MC learning
- 06_Temporal_Difference_Learning.ipynb - TD methods
- _Q_vs_DQN.ipynb - Tabular vs Deep RL comparison
# Start with the introduction notebook
jupyter lab 00_RL_intro.ipynbThis project is licensed under the MIT License - see the LICENSE file for details.
If this repository helped you learn RL, please consider:
- โญ Starring this repository
- ๐ด Forking for your own experiments
- ๐ข Sharing with fellow ML enthusiasts
- ๐ Contributing improvements and bug fixes
This repository builds upon the incredible work of the RL community:
- ๐ Sutton & Barto: Reinforcement Learning: An Introduction (The RL Bible)
- ๐ง DeepMind: Pioneering DQN, AlphaGo, and agent architectures
- ๐ OpenAI: Advancing RL research and democratizing AI
- ๐ฎ Gymnasium: Standard RL environment interface
- ๐ฅ PyTorch: Deep learning framework
- ๐ NumPy & Matplotlib: Scientific computing and visualization
- ๐ Jupyter: Interactive development environment
- ๐บ David Silver's RL Course (DeepMind/UCL)
- ๐ฅ Stanford CS234: Reinforcement Learning
- ๐ฑ Berkeley CS 285: Deep Reinforcement Learning
โญ Star this repository if you found it helpful! โญ
Happy Learning! ๐
