Skip to content

Exploring the fundamentals of reinforcement learning (RL) to build agents capable of navigating complex real-world environments and enhancing the training of large language models (LLMs)

License

Notifications You must be signed in to change notification settings

mohd-faizy/Reinforcement_learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  Reinforcement Learning

A comprehensive repository for learning Reinforcement Learning through theory, algorithms, and hands-on practice.

Author Python PyTorch Gymnasium Jupyter License


๐Ÿ“š Table of Contents


๐Ÿ›ฃ๏ธ Learning Roadmap

Reinforcement Learning Roadmap

๐Ÿงฌ What is Reinforcement Learning?

Reinforcement Learning (RL) is a branch of machine learning where an agent learns to make decisions by interacting with an environment. Unlike supervised learning, RL doesn't rely on labeled data. Instead, the agent learns through trial and error, receiving rewards or penalties for its actions.

๐ŸŒŸ Real-World Applications

  • ๐Ÿค– Robotics: Robot navigation, manipulation, and control
  • ๐ŸŽฎ Gaming: AlphaGo, OpenAI Five, StarCraft II agents
  • ๐Ÿš— Autonomous Vehicles: Path planning and decision making
  • ๐Ÿ’ฐ Finance: Algorithmic trading and portfolio management
  • ๐ŸŽฏ Recommendation Systems: Personalized content delivery
  • โšก Energy: Smart grid optimization and resource allocation

๐Ÿ› ๏ธ Core Components

Component Description
๐Ÿค– Agent The decision-maker that learns and takes actions
๐ŸŒ Environment The world the agent interacts with (MDPs, Gym environments)
๐ŸŽ Reward Function Feedback signal that guides learning (positive/negative)
๐ŸŽฏ Policy The agent's strategy for choosing actions (deterministic/stochastic)
๐Ÿ’Ž Value Function Estimates expected future rewards from states/actions
๐Ÿ” Exploration vs Exploitation Balance between trying new actions and using known good ones
๐Ÿง  Training Algorithms Methods to improve the policy (Q-learning, policy gradients, etc.)

๐Ÿง  RL vs Supervised Learning

Aspect Reinforcement Learning Supervised Learning
๐Ÿ“Š Feedback Type Delayed rewards/penalties Immediate labels
๐Ÿ“ˆ Data Requirements Sequential interaction data Static labeled datasets
๐ŸŽฏ Training Objective Maximize cumulative reward Minimize prediction error
๐Ÿ“ค Output Policies (action strategies) Predictions/classifications
๐Ÿ”„ Learning Style Trial and error Pattern recognition
โฐ Temporal Aspect Sequential decision making Independent predictions

๐Ÿ“ Learning Path

๐ŸŸข Beginner Level

  • ๐ŸŽฐ Multi-Armed Bandits
  • ๐Ÿ”„ Markov Decision Processes (MDPs)
  • โšก Dynamic Programming (Value & Policy Iteration)
  • ๐ŸŽฒ Monte Carlo Methods
  • โฐ Temporal Difference Learning (TD)

๐ŸŸก Intermediate Level

  • ๐ŸŽฏ Q-Learning & SARSA
  • ๐ŸŽช Expected SARSA & Double Q-Learning
  • ๐Ÿ“ˆ Function Approximation
  • ๐Ÿง  Deep Q-Networks (DQN)
  • ๐Ÿ”ง DQN Variants (Double DQN, Dueling DQN)

๐Ÿ”ด Advanced Level

  • ๐ŸŽญ Policy Gradient Methods (REINFORCE)
  • ๐ŸŽช Actor-Critic Methods (A2C, A3C)
  • ๐Ÿš€ Proximal Policy Optimization (PPO)
  • ๐ŸŒŸ Deep Deterministic Policy Gradient (DDPG)
  • ๐ŸŽฏ Soft Actor-Critic (SAC)
  • ๐Ÿค Multi-Agent Reinforcement Learning
  • ๐Ÿ—๏ธ Hierarchical Reinforcement Learning

โš™๏ธ Installation

๐Ÿ“‹ Prerequisites

  • Python 3.13+
  • Git installed
  • UV package manager (recommended)
  • (Optional) CUDA-compatible GPU for deep RL training

๐Ÿš€ Quick Setup

# Clone the repository
git clone https://github.com/mohd-faizy/Reinforcement_learning.git
cd Reinforcement_learning

# Using UV (recommended)
uv venv rl_env
source rl_env/bin/activate   # macOS/Linux
.\rl_env\Scripts\activate    # Windows
uv pip install -r requirements.txt

# Or using pip
pip install -r requirements.txt

# Launch Jupyter Lab to explore notebooks
jupyter lab

๐Ÿ“‚ Repository Structure

Reinforcement_learning/
โ”œโ”€โ”€ ๐Ÿ“ _img/                          # Images and visualizations
โ”‚   โ”œโ”€โ”€ rl-map.png                    # Learning roadmap
โ”‚   โ”œโ”€โ”€ frozen-lake.png               # Environment diagrams
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ ๐Ÿ“ 00_RL/                         # Basic RL implementations
โ”‚   โ”œโ”€โ”€ 02_frozen_lake.py             # FrozenLake environment
โ”‚   โ”œโ”€โ”€ 03_mountain_car.py            # MountainCar environment
โ”‚   โ”œโ”€โ”€ 04_taxi-parking.py            # Taxi environment
โ”‚   โ””โ”€โ”€ 05_cliffwalking.py            # CliffWalking environment
โ”œโ”€โ”€ ๐Ÿ“ 01_Q-Learning/                 # Q-Learning implementations
โ”‚   โ”œโ”€โ”€ 00_Q-Learning.ipynb           # Q-Learning tutorial
โ”‚   โ”œโ”€โ”€ 02_cartpole_Q.py              # CartPole with Q-Learning
โ”‚   โ””โ”€โ”€ 03_frozen_lake_Q.py           # FrozenLake with Q-Learning
โ”œโ”€โ”€ ๐Ÿ“ 02_DQN/                        # Deep Q-Network implementations
โ”‚   โ”œโ”€โ”€ 00_cartpole_DQN.py            # CartPole with DQN
โ”‚   โ””โ”€โ”€ 01_mountain_car_DQN.py        # MountainCar with DQN
โ”œโ”€โ”€ ๐Ÿ““ 00_RL_intro.ipynb              # Introduction to RL
โ”œโ”€โ”€ ๐Ÿ““ 01_Markov_Decision_Processes.ipynb
โ”œโ”€โ”€ ๐Ÿ““ 02_State_&_Action_value.ipynb
โ”œโ”€โ”€ ๐Ÿ““ 03_Policy_&_Value_Iteration.ipynb
โ”œโ”€โ”€ ๐Ÿ““ 05_Monte_Carlo_Methods.ipynb
โ”œโ”€โ”€ ๐Ÿ““ 06_Temporal_Difference_Learning.ipynb
โ”œโ”€โ”€ ๐Ÿ““ _Q_vs_DQN.ipynb                # Comparison of Q-Learning vs DQN
โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt               # Python dependencies
โ”œโ”€โ”€ ๐Ÿ“„ pyproject.toml                 # Project configuration
โ””โ”€โ”€ ๐Ÿ“„ README.md                      # This file

๐Ÿš€ Getting Started

๐Ÿ“Š Algorithm Implementations

Algorithm Implementation Environment Status
๐ŸŽฐ Multi-Armed Bandit 00_RL/ Custom Bandits โœ…
๐ŸŽฏ Q-Learning 01_Q-Learning/ FrozenLake, CartPole โœ…
๐Ÿง  Deep Q-Network (DQN) 02_DQN/ CartPole, MountainCar โœ…
๐Ÿ”„ Value Iteration 03_Policy_&_Value_Iteration.ipynb GridWorld โœ…
๐ŸŽฒ Monte Carlo 05_Monte_Carlo_Methods.ipynb Blackjack โœ…
โฐ TD Learning 06_Temporal_Difference_Learning.ipynb Various โœ…

๐Ÿ““ Interactive Notebooks

Start your RL journey with these comprehensive notebooks:

  1. 00_RL_intro.ipynb - Fundamentals of RL
  2. 01_Markov_Decision_Processes.ipynb - MDPs and Bellman equations
  3. 02_State_&_Action_value.ipynb - Value functions
  4. 03_Policy_&_Value_Iteration.ipynb - Dynamic programming
  5. 05_Monte_Carlo_Methods.ipynb - MC learning
  6. 06_Temporal_Difference_Learning.ipynb - TD methods
  7. _Q_vs_DQN.ipynb - Tabular vs Deep RL comparison
# Start with the introduction notebook
jupyter lab 00_RL_intro.ipynb

โš–๏ธ License

This project is licensed under the MIT License - see the LICENSE file for details.

โค๏ธ Support

If this repository helped you learn RL, please consider:

  • โญ Starring this repository
  • ๐Ÿด Forking for your own experiments
  • ๐Ÿ“ข Sharing with fellow ML enthusiasts
  • ๐Ÿ› Contributing improvements and bug fixes

๐Ÿช™ Credits & Inspiration

This repository builds upon the incredible work of the RL community:

๐Ÿ“š Foundational Resources

  • ๐Ÿ“– Sutton & Barto: Reinforcement Learning: An Introduction (The RL Bible)
  • ๐Ÿง  DeepMind: Pioneering DQN, AlphaGo, and agent architectures
  • ๐Ÿš€ OpenAI: Advancing RL research and democratizing AI

๐Ÿ› ๏ธ Open Source Libraries

  • ๐ŸŽฎ Gymnasium: Standard RL environment interface
  • ๐Ÿ”ฅ PyTorch: Deep learning framework
  • ๐Ÿ“Š NumPy & Matplotlib: Scientific computing and visualization
  • ๐Ÿ““ Jupyter: Interactive development environment

๐ŸŽ“ Educational Inspiration

  • ๐Ÿ“บ David Silver's RL Course (DeepMind/UCL)
  • ๐ŸŽฅ Stanford CS234: Reinforcement Learning
  • ๐Ÿ“ฑ Berkeley CS 285: Deep Reinforcement Learning

๐Ÿ”— Connect with me

Twitter LinkedIn Stack Exchange GitHub


โญ Star this repository if you found it helpful! โญ

Happy Learning! ๐Ÿš€

About

Exploring the fundamentals of reinforcement learning (RL) to build agents capable of navigating complex real-world environments and enhancing the training of large language models (LLMs)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published