-
-
Notifications
You must be signed in to change notification settings - Fork 109
Fixing and generalizing GaussianNetwork #592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing and generalizing GaussianNetwork #592
Conversation
...ningCore/src/policies/q_based_policies/learners/approximators/neural_network_approximator.jl
Outdated
Show resolved
Hide resolved
Nice work!
Thanks! Better to avoid breaking changes 😃 . And thanks for adding the doc string.
👍
It's OK to add it in this PR.
What are
Thanks for adding those tests. See inline comment to avoid breaking existing CI pipeline. |
Sorry I forgot to copy a line.
I'll change to isapprox yes, oops. |
…arners/approximators/neural_network_approximator.jl
I changed the rand of other NN approximators then to be consistent. I think this will break CI though, we'll have to pass an rng explicitly when testing on the GPU. |
Yes, you can add a device rng like this: Line 25 in f4cf555
|
Since the test is already in a I'm a bit worried: the test were successful before I commited that, does it mean that they do not run the CUDA.functional() block ? |
Commit f7348da makes the PR breaking actually. I don't see a way to remove the device transfer overhead without breaking existing implementations though :/ |
As I was working on a covariance version of the
GaussianNetwork
learner, I noticed a few issues with the latter that this PR fixes.In +/- the order of the commits:
tanh
normalization restrains the applicability of RL.jl to theses applications. This PR adds a field to theGaussianNetwork
struct that contains the elementwise normalizer function. It is defaulted totanh
to avoid breaking changes (if breaking is acceptable, we could default it toidentity
instead).GaussianNetwork
caller that takes a batch of states as input and an integerN
and will sampleN
actions for each state in the batch. As explained in the docstring, we now have to work with 3D tensors for both the state-batch input and the actions output. I made this choice instead of working with aVector{Matrix}
input/output mainly for the efficiency of computing the outputs of NN for the entire batch in one pass.GaussianNetwork
initializer for the convenience of not needing to use keywords for the NN (pre, mu, logsigma). Totally optional, I'd understand if you'd rather not have these.GaussianNetwork
was broken, this is because it was computed on the unnormalized sampled actions. That is, this test was failing before:Now this is fixed.
That's it. I hope you'll like it and I'm receptive to suggestions and feedback of course.