😄 This work has been accepted in 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).
🚀 This work implements a novel Reinforcement Learning (RL) approach for autonomous driving with monotonic evolution capability. The algorithm ensures continuous policy improvement with a high confidence guarantee.
- Monotonic performance enhancement by high confidence policy improvement
- Ensure the safe and robust online training
- Integrate both decision-making and motion planning
main.py: Main training script for the reinforcement learning algorithmmonotonic_evolution_RL.py: Implementation of the PPO (Proximal Policy Optimization) algorithmnormalization.py: State and reward normalization utilitiesreplaybuffer.py: Experience replay buffer for storing transitionsVissimEnvironment.py: Interface between VISSIM traffic simulation and the RL algorithm
- Windows operating system (required for VISSIM integration)
- VISSIM traffic simulation software (version 22)
- Python 3.8
- Conda package manager
-
Install VISSIM 22 on your Windows system
-
Create and activate the conda environment:
conda env create -f environment.yml conda activate monotonic_evolution_rl -
Verify VISSIM is properly installed and accessible via COM interface
Run the main training script:
python main.py
You can modify hyperparameters using command line arguments, for example:
python main.py --max_train_steps 500000 --gamma 0.98
The default hyperparameters can be found in main.py. You can customize:
--max_train_steps: Maximum number of training steps--gamma: Discount factor for future rewards--hidden_width: Width of hidden layers in networks- And many other PPO-specific parameters