Skip to content

Conversation

qrh1
Copy link

@qrh1 qrh1 commented Oct 29, 2018

Rewards should not discounted across different episodes. maybe episodes and steps are confused here?

@titu1994
Copy link
Owner

for t in reversed(range(0, rewards.size)):
            if rewards[t] != 0:
                running_add = 0
            running_add = running_add * self.discount_factor + rewards[t]
            discounted_rewards[t] = running_add
        return discounted_rewards[-1]

This us the discounted reward, which returns your values anyway.

@qrh1
Copy link
Author

qrh1 commented Oct 30, 2018

hi Somshubra, I still don't get it, could you please expain more?
In my understanding, each action is a step, the 8 actions is a episode. In RL, we usually discout rewards for history steps, but for each independent episode, rewards are caculated independent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants