This project implements a reinforcement learning agent trained to walk using Proximal Policy Optimization (PPO) in the BipedalWalker-v3 environment from OpenAI Gym. The agent learns bipedal locomotion from scratch through interaction with a continuous control environment, using reward signals to guide policy updates.
To train a simulated bipedal robot to walk autonomously using deep reinforcement learning — without imitation learning, predefined motion policies, or manual control logic.
- Environment: OpenAI Gym
BipedalWalker-v3 - Algorithm: PPO (Proximal Policy Optimization)
- Framework: Stable-Baselines3 (PyTorch)
- Observation Space: 24 continuous inputs
- Includes hull angle/velocity, joint angles, ground contacts, lidar-based terrain sensing
- Action Space: 4-dimensional continuous vector (torques to hips and knees)
- Training: 1M+ timesteps using a reward function encouraging forward motion and penalizing inefficient torque usage
pip install -r requirements.txtpython train.py
To run a trained agent and visualize walking behavior:
python test.py