Improve the PPOLearner to support continuous action space

Necessary changes:

1. Add a field of `dist` into the `PPOLearner` (just like @norci  did in [VPG](https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/34aea9030577ed5cea6630ad02dc60e6c4cfd9a6/src/algorithms/policy_gradient/ppo.jl#L31) )

1. Following method needs to be extended to recognize environments with continuous action space. Currently the PPOLearner is assumed to return a (batch of ) logits. I'd suggest to rename the `PPOLearner` into `PPOPolicy` and return an action directly. 
https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/blob/34aea9030577ed5cea6630ad02dc60e6c4cfd9a6/src/algorithms/policy_gradient/ppo.jl#L77

1. [GausianNetwork](https://github.com/JuliaReinforcementLearning/ReinforcementLearningZoo.jl/pull/87/files#diff-dc9b8ba63840800093295c526ec3aa91R9-R13) is also needed.

1. Calculating entropy loss in `update!` is hard coded. Better to split it into a function to support continuous distribution (or reuse the one in StatsBase or Distributions. But use them with caution! I had some problems using them with Zygote before)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the PPOLearner to support continuous action space #92

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve the PPOLearner to support continuous action space #92

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions