This folder contains the code for the paper
Heiko Hoppe, Léo Baty, Louis Bouvier, Axel Parmentier, Maximilian Schiffer (2025). Structured Reinforcement Learning for Combinatorial Decision-Making. arXiv preprint on arXiv: tba.
The code implements COaML-pipelines trained using Structured Reinforcement Learning (SRL), Structured Imitation Learning (SIL), and Proximal Policy Optimization (PPO) for six industrial problem settings using Julia 1.11.5.
The folder scripts contains all source code for the paper. It contains a sub-folder for each of the environments:
- DAP: Dynamic Assortment Problem
- DVSP: Dynamic Vehicle Scheduling Problem
- GSPP: Gridworld Shortest Paths Problem
- SMSP: Single Machine Scheduling Problem
- SVSP: Stochastic Vehicle Scheduling Problem
- WSPP: Warcraft Shortest Paths Problem
The folder of each environment contains an implementation of SIL, PPO, and SRL, as well as a greedy and an expert benchmark for the specific environment. Each environment-folder is sturctured as follows:
- utils: Folder containing environment funcions, should not be run directly
- 00_setup.jl: Dataset setup and baseline (expert and greedy) solutions
- 01_SIL.jl: Structured Imitation Learning training function and executable code
- 02_PPO.jl: Proximal Policy Optimization training function and executable code
- 03_SRL.jl: Structured Reinforcement Learning training function and executable code
- 04_plots.jl: Code to create a cumulative lineplot of training performance and a boxplot of testing performance
To set up a working environment for the code, please follow these steps:
- Install the Julia programming language, version 1.11.5 (see https://julialang.org/install/)
- Open this software in your favorite IDE and activate a Julia REPL
- Instantiate the Julia environment of this folder:
using Pkg
Pkg.activate(".")
Pkg.instantiate()- Make sure to have an active internet connection and ca. 150MB of free disc space for downloading and storing instance and log files when running the code for the first time
To train and test the algorithms for an environment, please follow these steps:
- Find the corresponding environment folder
- Run 00_setup.jl:
julia --project=. folder/00_setup.jl- Run the algorithm scripts 01_SIL.jl, 02_PPO.jl, and 03_SRL.jl (same as 2.)
- Run 04_plots.jl (same as 2.)
To reproduce the results from the paper, please run the algorithms using ten random seeds and average the rewards across these seeds. The seeds used in the paper are stated in the respective setup script.