You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
Thanks so much for sharing this, what a great repo.
I've noticed that the final actor layer is not really activated, rather a distribution object (say categorical) is used.
Later the log probabilities are taken to compute the actor's loss.
Don't we lose the desired mesh that the softmax function gives us in this case?
IE we encourage good actions and discourage bad actions less then if we'd used softmax, right?
Just wanted to ask is this on propose or did I misunderstand the code?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi
Thanks so much for sharing this, what a great repo.
I've noticed that the final actor layer is not really activated, rather a distribution object (say categorical) is used.
Later the log probabilities are taken to compute the actor's loss.
Don't we lose the desired mesh that the softmax function gives us in this case?
IE we encourage good actions and discourage bad actions less then if we'd used softmax, right?
Just wanted to ask is this on propose or did I misunderstand the code?
Thanks!
The text was updated successfully, but these errors were encountered: