Skip to content

RewardForwardFilter to compute intrinsic returns for normalize intrinsic reward #16

Open
@boscotsang

Description

In ppo_agent.py, it compute the running estimate of intrinsic returns with rff_int.
rffs_int = np.array([self.I.rff_int.update(rew) for rew in self.I.buf_rews_int.T])
In reinforcement learning, returns are computed by sum{\gamma^t * r_t}. However in rff_int, it seems that it compute the returns by sum{\gamma^(T-t) * r_t) which discounted the reward forward.
What's the reason for compute the intrinsic returns forward?
Thanks!

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions