N-step returns #282

ghost · 2020-07-29T12:01:54Z

Theory

This change replacing the Bellman operator with an N-step variant. N-step returns are widely used in the context of many policy gradient algorithms as well as Q-learning variants. Using N-step returns often lead to faster learning.

I tested it on BipedalWalker-v3 with 1, 5 and 10 steps.

If a user won't use N-step returns, please set the parameter --n_steps to 1.

Charts

Math

Links

Rainbow: Combining Improvements in Deep Reinforcement Learning (Multi-step learning)

N-step returns

51c74f1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

N-step returns #282

N-step returns #282

ghost commented Jul 29, 2020 •

edited by ghost

Loading

N-step returns #282

Are you sure you want to change the base?

N-step returns #282

Conversation

ghost commented Jul 29, 2020 • edited by ghost Loading

Theory

Charts

Math

Links

ghost commented Jul 29, 2020 •

edited by ghost

Loading