Fixing and generalizing GaussianNetwork #592

HenriDeh · 2022-03-04T11:42:13Z

As I was working on a covariance version of the GaussianNetwork learner, I noticed a few issues with the latter that this PR fixes.
In +/- the order of the commits:

RL is not always used for actions in the [-1,1] action-space and thus imposing the tanh normalization restrains the applicability of RL.jl to theses applications. This PR adds a field to the GaussianNetwork struct that contains the elementwise normalizer function. It is defaulted to tanh to avoid breaking changes (if breaking is acceptable, we could default it to identity instead).
Some algorithms (e.g. MPO) require sampling multiple actions per state. This PR adds a new GaussianNetwork caller that takes a batch of states as input and an integer N and will sample N actions for each state in the batch. As explained in the docstring, we now have to work with 3D tensors for both the state-batch input and the actions output. I made this choice instead of working with a Vector{Matrix} input/output mainly for the efficiency of computing the outputs of NN for the entire batch in one pass.
While I was at it, Julia asked me to upgrade Manifest.toml to the new format. This is done in this PR, if desired, I can split this to a new one.
I added a default GaussianNetwork initializer for the convenience of not needing to use keywords for the NN (pre, mu, logsigma). Totally optional, I'd understand if you'd rather not have these.
The most important change, I need someone to confirm me that this was indeed broken and that this is not a misunderstanding on my end. The logpdf returned by the GaussianNetwork was broken, this is because it was computed on the unnormalized sampled actions. That is, this test was failing before:

pre = Dense(20,15)
μ = Dense(15,10)
logσ = Dense(15,10)
gn = GaussianNetwork(pre, μ, logσ)
state = rand(20,3) #batch of 3 states
a, logp = gn(state, is_sampling = true, is_return_log_prob = true)
@test logp == sum(normlogpdf(m, exp.(s), a) .- (2.0f0 .* (log(2.0f0) .- a .- softplus.(-2.0f0 .* a))), dims = 1)
@test logp == gn(state, a)

Now this is fixed.

I was doing tests for my covariance implementation so I added some for this one too. That's how I found out about the normalization problem above. I think unit tests and documenting should be required for a PR to be merged, otherwise it is simply never added in the future.

That's it. I hope you'll like it and I'm receptive to suggestions and feedback of course.

...ningCore/src/policies/q_based_policies/learners/approximators/neural_network_approximator.jl

src/ReinforcementLearningCore/test/components/approximators.jl

findmyway · 2022-03-04T12:48:34Z

Nice work!

As I was working on a covariance version of the GaussianNetwork learner, I noticed a few issues with the latter that this PR fixes. In +/- the order of the commits:

RL is not always used for actions in the [-1,1] action-space and thus imposing the tanh normalization restrains the applicability of RL.jl to theses applications. This PR adds a field to the GaussianNetwork struct that contains the elementwise normalizer function. It is defaulted to tanh to avoid breaking changes (if breaking is acceptable, we could default it to identity instead).

Thanks! Better to avoid breaking changes 😃 . And thanks for adding the doc string.

Some algorithms (e.g. MPO) require sampling multiple actions per state. This PR adds a new GaussianNetwork caller that takes a batch of states as input and an integer N and will sample N actions for each state in the batch. As explained in the docstring, we now have to work with 3D tensors for both the state-batch input and the actions output. I made this choice instead of working with a Vector{Matrix} input/output mainly for the efficiency of computing the outputs of NN for the entire batch in one pass.

👍

While I was at it, Julia asked me to upgrade Manifest.toml to the new format. This is done in this PR, if desired, I can split this to a new one.

It's OK to add it in this PR.

I added a default GaussianNetwork initializer for the convenience of not needing to use keywords for the NN (pre, mu, logsigma). Totally optional, I'd understand if you'd rather not have these.

The most important change, I need someone to confirm me that this was indeed broken and that this is not a misunderstanding on my end. The logpdf returned by the GaussianNetwork was broken, this is because it was computed on the unnormalized sampled actions. That is, this test was failing before:
pre = Dense(20,15)
μ = Dense(15,10)
logσ = Dense(15,10)
gn = GaussianNetwork(pre, μ, logσ)
state = rand(20,3) #batch of 3 states
a, logp = gn(state, is_sampling = true, is_return_log_prob = true)
@test logp == sum(normlogpdf(m, exp.(s), a) .- (2.0f0 .* (log(2.0f0) .- a .- softplus.(-2.0f0 .* a))), dims = 1)
@test logp == gn(state, a)

What are m and s here? I'd use isapprox for comparison here.

Now this is fixed.

I was doing tests for my covariance implementation so I added some for this one too. That's how I found out about the normalization problem above. I think unit tests and documenting should be required for a PR to be merged, otherwise it is simply never added in the future.

That's it. I hope you'll like it and I'm receptive to suggestions and feedback of course.

Thanks for adding those tests. See inline comment to avoid breaking existing CI pipeline.

HenriDeh · 2022-03-04T12:53:14Z

Sorry I forgot to copy a line.

pre = Dense(20,15)
μ = Dense(15,10)
logσ = Dense(15,10)
gn = GaussianNetwork(pre, μ, logσ)
state = rand(20,3) #batch of 3 states
m, s = gn(state)
a, logp = gn(state, is_sampling = true, is_return_log_prob = true)
@test logp == sum(normlogpdf(m, exp.(s), a) .- (2.0f0 .* (log(2.0f0) .- a .- softplus.(-2.0f0 .* a))), dims = 1)
@test logp == gn(state, a)`

I'll change to isapprox yes, oops.

…arners/approximators/neural_network_approximator.jl

HenriDeh · 2022-03-04T13:26:50Z

I changed the rand of other NN approximators then to be consistent. I think this will break CI though, we'll have to pass an rng explicitly when testing on the GPU.

findmyway · 2022-03-04T13:34:46Z

Yes, you can add a device rng like this:

ReinforcementLearning.jl/src/ReinforcementLearningExperiments/deps/experiments/experiments/DQN/JuliaRL_IQN_CartPole.jl

Line 25 in f4cf555

device_rng = CUDA.functional() ? CUDA.CURAND.RNG() : rng

HenriDeh · 2022-03-04T13:44:38Z

Yes, you can add a device rng like this:

ReinforcementLearning.jl/src/ReinforcementLearningExperiments/deps/experiments/experiments/DQN/JuliaRL_IQN_CartPole.jl

Line 25 in f4cf555

device_rng = CUDA.functional() ? CUDA.CURAND.RNG() : rng

Since the test is already in a if CUDA.functional() block, I directly added the CUDA rng as argument.

I'm a bit worried: the test were successful before I commited that, does it mean that they do not run the CUDA.functional() block ?

HenriDeh · 2022-03-04T13:52:36Z

Commit f7348da makes the PR breaking actually. I don't see a way to remove the device transfer overhead without breaking existing implementations though :/

HenriDeh added 7 commits March 4, 2022 12:22

custom normalizer and multi action sampling

99d5c89

Complete docs on gaussian normalizer

78ac2b8

Upgrade manifest format and update dependecies

857987d

add default initializer

1cc3c35

Fix logp_pi

027d0b3

add convenience function

075b1a8

add unit tests for GaussianNetwork

fefd141

findmyway requested changes Mar 4, 2022

View reviewed changes

...ningCore/src/policies/q_based_policies/learners/approximators/neural_network_approximator.jl Outdated Show resolved Hide resolved

src/ReinforcementLearningCore/test/components/approximators.jl Outdated Show resolved Hide resolved

HenriDeh and others added 4 commits March 4, 2022 14:09

use isapprox in tests

4d488b8

Update src/ReinforcementLearningCore/src/policies/q_based_policies/le…

e24854f

…arners/approximators/neural_network_approximator.jl

add unknown words

eb2008b

Rand directly on the device for all NN approx

f7348da

fix CUDA functional

76fb288

Fix CUDA rand test

6d627b6

add CURAND rng

ef06ba4

findmyway enabled auto-merge (squash) March 4, 2022 14:42

Merge branch 'master' into GaussianNets

a8c5fa1

findmyway merged commit a90c485 into JuliaReinforcementLearning:master Mar 4, 2022

HenriDeh mentioned this pull request Jul 29, 2022

tanh normalization destabilizes learning with GaussianNetwork #745

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixing and generalizing GaussianNetwork #592

Fixing and generalizing GaussianNetwork #592

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

Uh oh!

Uh oh!

findmyway commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

findmyway commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

Uh oh!

Uh oh!

Fixing and generalizing GaussianNetwork #592

Fixing and generalizing GaussianNetwork #592

Uh oh!

Conversation

HenriDeh commented Mar 4, 2022

Uh oh!

Uh oh!

Uh oh!

findmyway commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

findmyway commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

HenriDeh commented Mar 4, 2022

Uh oh!

Uh oh!