Skip to content

Fixing and generalizing GaussianNetwork #592

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Mar 4, 2022
Merged

Fixing and generalizing GaussianNetwork #592

merged 15 commits into from
Mar 4, 2022

Conversation

HenriDeh
Copy link
Member

@HenriDeh HenriDeh commented Mar 4, 2022

As I was working on a covariance version of the GaussianNetwork learner, I noticed a few issues with the latter that this PR fixes.
In +/- the order of the commits:

  • RL is not always used for actions in the [-1,1] action-space and thus imposing the tanh normalization restrains the applicability of RL.jl to theses applications. This PR adds a field to the GaussianNetwork struct that contains the elementwise normalizer function. It is defaulted to tanh to avoid breaking changes (if breaking is acceptable, we could default it to identity instead).
  • Some algorithms (e.g. MPO) require sampling multiple actions per state. This PR adds a new GaussianNetwork caller that takes a batch of states as input and an integer N and will sample N actions for each state in the batch. As explained in the docstring, we now have to work with 3D tensors for both the state-batch input and the actions output. I made this choice instead of working with a Vector{Matrix} input/output mainly for the efficiency of computing the outputs of NN for the entire batch in one pass.
  • While I was at it, Julia asked me to upgrade Manifest.toml to the new format. This is done in this PR, if desired, I can split this to a new one.
  • I added a default GaussianNetwork initializer for the convenience of not needing to use keywords for the NN (pre, mu, logsigma). Totally optional, I'd understand if you'd rather not have these.
  • The most important change, I need someone to confirm me that this was indeed broken and that this is not a misunderstanding on my end. The logpdf returned by the GaussianNetwork was broken, this is because it was computed on the unnormalized sampled actions. That is, this test was failing before:
pre = Dense(20,15)
μ = Dense(15,10)
logσ = Dense(15,10)
gn = GaussianNetwork(pre, μ, logσ)
state = rand(20,3) #batch of 3 states
a, logp = gn(state, is_sampling = true, is_return_log_prob = true)
@test logp == sum(normlogpdf(m, exp.(s), a) .- (2.0f0 .* (log(2.0f0) .- a .- softplus.(-2.0f0 .* a))), dims = 1)
@test logp == gn(state, a)

Now this is fixed.

  • I was doing tests for my covariance implementation so I added some for this one too. That's how I found out about the normalization problem above. I think unit tests and documenting should be required for a PR to be merged, otherwise it is simply never added in the future.

That's it. I hope you'll like it and I'm receptive to suggestions and feedback of course.

@findmyway
Copy link
Member

Nice work!

As I was working on a covariance version of the GaussianNetwork learner, I noticed a few issues with the latter that this PR fixes. In +/- the order of the commits:

  • RL is not always used for actions in the [-1,1] action-space and thus imposing the tanh normalization restrains the applicability of RL.jl to theses applications. This PR adds a field to the GaussianNetwork struct that contains the elementwise normalizer function. It is defaulted to tanh to avoid breaking changes (if breaking is acceptable, we could default it to identity instead).

Thanks! Better to avoid breaking changes 😃 . And thanks for adding the doc string.

  • Some algorithms (e.g. MPO) require sampling multiple actions per state. This PR adds a new GaussianNetwork caller that takes a batch of states as input and an integer N and will sample N actions for each state in the batch. As explained in the docstring, we now have to work with 3D tensors for both the state-batch input and the actions output. I made this choice instead of working with a Vector{Matrix} input/output mainly for the efficiency of computing the outputs of NN for the entire batch in one pass.

👍

  • While I was at it, Julia asked me to upgrade Manifest.toml to the new format. This is done in this PR, if desired, I can split this to a new one.

It's OK to add it in this PR.

  • I added a default GaussianNetwork initializer for the convenience of not needing to use keywords for the NN (pre, mu, logsigma). Totally optional, I'd understand if you'd rather not have these.
  • The most important change, I need someone to confirm me that this was indeed broken and that this is not a misunderstanding on my end. The logpdf returned by the GaussianNetwork was broken, this is because it was computed on the unnormalized sampled actions. That is, this test was failing before:
pre = Dense(20,15)
μ = Dense(15,10)
logσ = Dense(15,10)
gn = GaussianNetwork(pre, μ, logσ)
state = rand(20,3) #batch of 3 states
a, logp = gn(state, is_sampling = true, is_return_log_prob = true)
@test logp == sum(normlogpdf(m, exp.(s), a) .- (2.0f0 .* (log(2.0f0) .- a .- softplus.(-2.0f0 .* a))), dims = 1)
@test logp == gn(state, a)

What are m and s here? I'd use isapprox for comparison here.

Now this is fixed.

  • I was doing tests for my covariance implementation so I added some for this one too. That's how I found out about the normalization problem above. I think unit tests and documenting should be required for a PR to be merged, otherwise it is simply never added in the future.

That's it. I hope you'll like it and I'm receptive to suggestions and feedback of course.

Thanks for adding those tests. See inline comment to avoid breaking existing CI pipeline.

@HenriDeh
Copy link
Member Author

HenriDeh commented Mar 4, 2022

Sorry I forgot to copy a line.

pre = Dense(20,15)
μ = Dense(15,10)
logσ = Dense(15,10)
gn = GaussianNetwork(pre, μ, logσ)
state = rand(20,3) #batch of 3 states
m, s = gn(state)
a, logp = gn(state, is_sampling = true, is_return_log_prob = true)
@test logp == sum(normlogpdf(m, exp.(s), a) .- (2.0f0 .* (log(2.0f0) .- a .- softplus.(-2.0f0 .* a))), dims = 1)
@test logp == gn(state, a)`

I'll change to isapprox yes, oops.

@HenriDeh
Copy link
Member Author

HenriDeh commented Mar 4, 2022

I changed the rand of other NN approximators then to be consistent. I think this will break CI though, we'll have to pass an rng explicitly when testing on the GPU.

@findmyway
Copy link
Member

Yes, you can add a device rng like this:

@HenriDeh
Copy link
Member Author

HenriDeh commented Mar 4, 2022

Yes, you can add a device rng like this:

Since the test is already in a if CUDA.functional() block, I directly added the CUDA rng as argument.

I'm a bit worried: the test were successful before I commited that, does it mean that they do not run the CUDA.functional() block ?

@HenriDeh
Copy link
Member Author

HenriDeh commented Mar 4, 2022

Commit f7348da makes the PR breaking actually. I don't see a way to remove the device transfer overhead without breaking existing implementations though :/

@findmyway findmyway enabled auto-merge (squash) March 4, 2022 14:42
@findmyway findmyway merged commit a90c485 into JuliaReinforcementLearning:master Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants