Skip to content

Conversation

bilelsgh
Copy link

@bilelsgh bilelsgh commented Aug 5, 2025

Feature overview

Implementation of Prioritized Experience Replay (PER) with Prioritized Approximation Loss (PAL) (linked to #1622).
A NeurIPS 2020 paper shows that using PER is equivalent to adapting the loss function while using uniform experience replay.

The expected gradient of the loss function (1/τ) * |δ(i)|^τ, where τ > 0, when used with PER, is equal to the expected gradient of the following loss under uniform sampling.
https://papers.neurips.cc/paper_files/paper/2020/file/a3bf6e4db673b6449c2f7d13ee6ec9c0-Paper.pdf

This means we can avoid managing a sorted buffer and the associated complexity, while still converging to the same gradient.

Description

I've added a new loss function, which adapts the Huber Loss by incorporating priority as described in the referenced paper. The buffer itself performs uniform sampling (ReplayBuffer). Additionally, I implemented a PrioritizedReplayBuffer to initialize the parameters alpha and beta (following the PAL or PER papers) and to properly handle the case where the PAL Loss is applied within the training method.

Motivation and Context

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)

Checklist

  • I've read the CONTRIBUTION guide (required)
  • I have updated the changelog accordingly (required).
  • My change requires a change to the documentation.
  • I have updated the tests accordingly (required for a bug fix or a new feature).
  • I have updated the documentation accordingly.
  • I have opened an associated PR on the SB3-Contrib repository (if necessary)
  • I have opened an associated PR on the RL-Zoo3 repository (if necessary)
  • I have reformatted the code using make format (required)
  • I have checked the codestyle using make check-codestyle and make lint (required)
  • I have ensured make pytest and make type both pass. (required)
  • I have checked that the documentation builds using make doc (required)

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

@bilelsgh
Copy link
Author

bilelsgh commented Aug 5, 2025

It was tested for cartpole

image image

The code is running properly. There is no significant improvement in either the loss or the reward. As detailed in the PER original paper, PER does not always lead to better performance, particularly in environments with low variance in TD-errors and a limited number of rare or informative transitions.

Feel free to evaluate the PR directly, or refer to the experiments presented in the paper used as the basis for this implementation.

@bilelsgh bilelsgh mentioned this pull request Aug 5, 2025
16 tasks
@nf2bi89
Copy link

nf2bi89 commented Oct 3, 2025

Could you provide a working example for testing that you used? When passing the buffer via the buffer class argument it appears to ignore it and thus not use the newly implemented code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants