tanh normalization destabilizes learning with GaussianNetwork

As I was testing MPO on the cartpole environment, I noticed the algorithm was pretty unstable and has trouble stabilizing at the 200 returns policy. I eventually thought about the `tanh` normalizer that comes by default with the `GaussianNetwork` (remember, it used to be "mandatory" before #592). To be honest, I did break things in #592 by moving the normalization but it was incorrect before anyways.  

However doing that makes learning unstable because the logpdf computed during training is not correct anymore, due to 
```math
N(tanh(y)| mu, sigma) != N(y| tanh(mu), sigma)
```
where N denotes the normal distribution.

Here's a comparison of two runs on cartpole with MPO (sorry the REPL messed up the plots):
![image](https://user-images.githubusercontent.com/47037088/181772690-33083d15-de31-4828-9dd9-8f4d9e06c09c.png)
The above one is when tanh is applied to actions at sample time, the bottom one is when tanh is applied to the actions at interaction time (by wrapping the environment into an `ActionTransformedEnv`). While this is not staggering, it does improve stability of convergence. The effect might be more pronounced for more complex tasks than cartpole. 

So I'd like to make a case to simply remove normalizer from `GaussianNetwork`:
- It improves stability.
- You can always recover the old behavior by adding the tanh activation to your neural net's output layer. 
- It's mathematically more correct.
- You can normalize using `ActionTransformedEnv`.

This is technically a "breaking" change, though I'd call it a "repairing change". But I think it's a change that should be done. 
If you agree with me, I can incorporate that in #604. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tanh normalization destabilizes learning with GaussianNetwork #745

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

tanh normalization destabilizes learning with GaussianNetwork #745

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions