Reinforcement Learning backend for autonomous driving using Proximal Policy Optimization (PPO). Trains an agent to navigate in a Unity 3D environment via ZeroMQ communication.
Unity Game World: PPO_AutoDRW_Unity3d_GameWorld
# Clone and setup
git clone <repo-url>
cd PPO_RL_AutoDRV_Compute_Backend
pip install -r requirements.txt
# Verify
python -c "import torch; print('CUDA:', torch.cuda.is_available())"-
Edit
app.py:MODE = "train" CONFIG_FILE = "config.json"
-
Start backend:
python app.py
-
Launch Unity client from game world repo
Outputs:
- Logs:
logs/train_<timestamp>.log - Checkpoints:
models/checkpoints/ppo_episode_<N>.pth(every 50 episodes) - Best model:
ppo_best.pth - Final model:
models/ppo_autodrive.pth
Edit config.json:
"training": {
"resume_from_checkpoint": "models/checkpoints/ppo_episode_1000.pth"
}-
Edit
app.py:MODE = "inference"
-
Edit
config.json:"inference": { "model_path": "ppo_best.pth" }
-
Run:
python app.py+ launch Unity client
Main settings in config.json:
Server:
host:127.0.0.1(localhost)port:65432(ZeroMQ connection)tickrate: Updates per second
Environment:
max_ray_distances: Ray sensor max distancesmax_speed: Vehicle max speedreward_collected_value: Reward for collectiblescollision_penalty: Collision penaltysurvival_reward: Per-step rewardstraight_driving_reward: Bonus for straight driving
Training:
total_episodes: Training episode countupdate_frequency: Policy update intervalsave_frequency: Checkpoint save intervalresume_from_checkpoint: Path to resume from (ornull)
PPO:
lr_actor,lr_critic: Learning ratesgamma: Discount factor (0.99)epsilon: PPO clip parameter (0.2)entropy_coef: Exploration bonusbatch_size: Training batch size
Observation Space (11D):
- 5 ray distances (normalized)
- 5 ray hit indicators (binary)
- 1 speed value
Action Space (3 discrete):
- 0: Turn Left
- 1: Straight
- 2: Turn Right
Rewards:
- Survival: +0.1/step
- Straight driving: +0.05
- Collection: +15.0
- Collision: -10.0
Actor (Policy): Input(11) → FC(256) → ReLU → FC(256) → ReLU → FC(3) → Softmax
Critic (Value): Input(11) → FC(256) → ReLU → FC(256) → ReLU → FC(1)
PPO_RL_AutoDRV_Compute_Backend/
├── app.py # Main entry point
├── config.json # Configuration
├── requirements.txt # Dependencies
├── src/
│ ├── server.py # ZeroMQ server
│ ├── environment.py # Gym environment
│ ├── ppo_model.py # PPO algorithm
│ ├── ppo_controller.py # Agent controller
│ ├── connection_manager.py # Connection handling
│ └── helpers.py # Utilities
├── models/
│ ├── ppo_autodrive.pth # Final model
│ ├── ppo_best.pth # Best model
│ └── checkpoints/ # Training checkpoints
└── logs/ # Training logs
Connection Issues:
- Verify Unity and Python use same
host:port - Check firewall settings
Training Issues:
- Lower learning rates if unstable
- Adjust reward structure
- Increase
collision_penaltyif too aggressive
GPU Not Working:
python -c "import torch; print(torch.cuda.is_available())"Communication Protocol: See CommunicationDesign.md for ZeroMQ protocol details.