Skip to content
This repository was archived by the owner on May 6, 2021. It is now read-only.
This repository was archived by the owner on May 6, 2021. It is now read-only.

Improve the PPOLearner to support continuous action space #92

@findmyway

Description

@findmyway

Necessary changes:

  1. Add a field of dist into the PPOLearner (just like @norci did in VPG )

  2. Following method needs to be extended to recognize environments with continuous action space. Currently the PPOLearner is assumed to return a (batch of ) logits. I'd suggest to rename the PPOLearner into PPOPolicy and return an action directly.

    function (learner::PPOLearner)(env::MultiThreadEnv)

  3. GausianNetwork is also needed.

  4. Calculating entropy loss in update! is hard coded. Better to split it into a function to support continuous distribution (or reuse the one in StatsBase or Distributions. But use them with caution! I had some problems using them with Zygote before)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions