Understanding the Effects of Second-Order Approximations in Natural Policy Gradient Reinforcement Learning [ArXiv]
- Python 3.8.0
pip install -r req.txt
- Mujoco 200 license
main.py
: main run file for model trainingmodels.py
: neural networks for policy and critic modelsoptim.py
: second-order approximations for realizing the natural gradientutils.py
: helper functions
scripts/
: bash training scripts formatted for compute canada/SLURM jobsvisualize/json
: training hyperparameters for each experimentvisualize/csv
: training results in .csv formatvisualize/performance.py
: (after training) view results & create .csv results- best to run with VSCode ipython cells
To run the baseline experiments:
- Tune hparams:
bash scripts/hparams/baseline.sh
- runs will be saved in
runs/hparams_baseline/...
- runs will be saved in
- Extract best hparams from runs:
python baseline_hparams.py
- the best hparams will be saved in
visualize/json/baseline.json
- the best hparams will be saved in
- Run training with hparams:
bash scripts/baseline/diagonal.sh
- runs will be saved in
runs/5e6_baseline/...
- runs will be saved in
- Run speed tests:
bash scripts/speed/baseline.sh
- runs will be saved in
runs/baseline_speed/...
- runs will be saved in
- View results: run interactive ipython in
visualize/performance.py
# %%
runs_path = pathlib.Path("../runs/5e6_baseline/")
speed_runs_path = pathlib.Path("../runs/baseline_speed/")
name = "baseline"
baseline_data = analyze(runs_path, speed_runs_path)
baseline_df = mean_df(*baseline_data, name, save=True)
- Code formatted with Black
- Experiment runs format:
runs/{experiment_name}/{env_name}/{approximation}_runs/{tensorboard folder}/...