WIP: Add MPO in zoo #604

HenriDeh · 2022-03-17T11:04:11Z

I'm opening this as a draft so discussions are possible early.
This implements the MPO algorithm from this paper and its improved version
PR Checklist

…rning.jl

…arners/approximators/neural_network_approximator.jl

HenriDeh · 2022-12-22T11:00:52Z

There we go, it's finally done. This PR adds MPO, you can find details in the dedicated doc page. It supports Categorical, Gaussian, and Full Covariance Gaussian policies. Compared to the MPO algorithm described in the related paper (see above), it does not support two main things:

It uses 1-step TD learning to update the critic network, whereas the paper uses retrace. Implementing retrace is a WIP.
It does not support distributed learners with gradient pooling. This is for later.

I implemented three experiments in the test suite, one for each type of policy. They all learn a perfect CartPole policy in less than a minute using only a CPU, at least on my computer.

HenriDeh and others added 30 commits February 17, 2022 11:48

various changes

5fa938e

improve SAC docstring

fbe4bc7

SAC: add target network and start policy defaults

f30ca9d

make GaussianNetwork normalizer customizable

aab6b5d

create MPOPolicy struct

98f1633

Merge branch 'master' of https://github.com/HenriDeh/ReinforcementLea…

d4d4b9c

…rning.jl

continue mpo

fab1b83

make GaussianNetwork normalizer customizable

dae8af0

add multiple action samples per state

967a7cc

Merge branch 'GaussianNets'

1d50b26

Add MPO algorithm

391cd25

custom normalizer and multi action sampling

99d5c89

Complete docs on gaussian normalizer

78ac2b8

Upgrade manifest format and update dependecies

857987d

add default initializer

1cc3c35

Fix logp_pi

027d0b3

add convenience function

075b1a8

add unit tests for GaussianNetwork

fefd141

use isapprox in tests

4d488b8

Update src/ReinforcementLearningCore/src/policies/q_based_policies/le…

e24854f

…arners/approximators/neural_network_approximator.jl

add unknown words

eb2008b

Rand directly on the device for all NN approx

f7348da

fix CUDA functional

76fb288

Fix CUDA rand test

6d627b6

add CURAND rng

ef06ba4

Merge branch 'master' into GaussianNets

a8c5fa1

logdet of matrix from L decomposition

6c919a7

computing logpdf given cholesky of Covariance

24ee6de

add CovGaussianNetwork

4fe1f61

add unit tests

5d163aa

HenriDeh added 26 commits December 19, 2022 17:31

fix spelling

e99b78a

remove normalizer in tests

8deaee8

fix typo

708f040

"fix" spelling

c585fc3

unfix DQN

b726371

Attempting to solve CI

aab75c0

add missing dep

aca37df

add my name to cspell

d25e0e9

trying again

138f847

up compat for trajectories

33a074e

exp

0e3e0af

Merge branch 'master' into mpo

7be5c74

add missing pkg dep

2de4b3c

fix doc

ca6e3d5

update tutorial

1c48bac

update compat for trajectories

616bc04

use rng

0e8944e

change default HPs

a5c7474

ci fix maybe

2c78e86

add plots

d7382d7

use ignore_derivatives()

f4d44b2

fix runtests and tangle

134060e

fix runtests

913feb7

fix devmode

ebc4aa8

fix ci

5147182

correct a mistake in doc

970128d

HenriDeh marked this pull request as ready for review December 22, 2022 11:00

HenriDeh merged commit b9d0ee0 into JuliaReinforcementLearning:master Dec 22, 2022

HenriDeh deleted the mpo branch December 22, 2022 11:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Add MPO in zoo #604

WIP: Add MPO in zoo #604

Uh oh!

HenriDeh commented Mar 17, 2022 •

edited

Loading

Uh oh!

HenriDeh commented Dec 22, 2022

Uh oh!

Uh oh!

Uh oh!

WIP: Add MPO in zoo #604

WIP: Add MPO in zoo #604

Uh oh!

Conversation

HenriDeh commented Mar 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HenriDeh commented Dec 22, 2022

Uh oh!

Uh oh!

HenriDeh commented Mar 17, 2022 •

edited

Loading