This repository provides an implementation of the Proximal Policy Optimization (PPO) algorithm for training an agent to master the CarRacing-v3 environment from OpenAI Gym with StableBaselines3. It aims to offer a straightforward yet powerful setup for both training and evaluating reinforcement learning agents in continuous control tasks.
- Conda (or any virtual environment tool)
- Python 3.11
- Trained on RTX 4060 Laptop GPU (optional)
Set up your environment and install dependencies by running:
conda create -n CRP python=3.11
conda activate CRP
pip install -r requirements.txt
To begin training PPO agent on the CarRacing environment, execute:
python3 src/train.py
This script will initialize the training process, log performance metrics, and save model checkpoints periodically. You can customize hyperparameters (like learning rate, discount factor, and clip range) directly within the script.
After training—or if you have a pre-trained model—you can evaluate your agent’s performance with:
python3 src/eval.py
This evaluation script loads the saved model and runs it in the environment, providing insights into its racing capabilities.
View a demonstration of the trained agent in action:
Screencast.from.01-11-2025.09.41.24.PM.webm
├── src
│ ├── train.py # Script for training the PPO agent
│ ├── eval.py # Script for evaluating the trained model
│ └── ... # Additional modules and utilities
├── requirements.txt # List of Python dependencies
└── README.md # This file
To modify training settings within src/train.py
to experiment with different configurations. Parameters such as learning rate, batch size, discount factor, and clip range can be adjusted to suit needs.
- OpenAI Gym for providing the simulation environment.
- StableBaselines3 for PPO implementation.