Add Prioritized Approximation Loss feature #2166

bilelsgh · 2025-08-05T14:00:23Z

Feature overview

Implementation of Prioritized Experience Replay (PER) with Prioritized Approximation Loss (PAL) (linked to #1622).
A NeurIPS 2020 paper shows that using PER is equivalent to adapting the loss function while using uniform experience replay.

The expected gradient of the loss function (1/τ) * |δ(i)|^τ, where τ > 0, when used with PER, is equal to the expected gradient of the following loss under uniform sampling.
https://papers.neurips.cc/paper_files/paper/2020/file/a3bf6e4db673b6449c2f7d13ee6ec9c0-Paper.pdf

This means we can avoid managing a sorted buffer and the associated complexity, while still converging to the same gradient.

Description

I've added a new loss function, which adapts the Huber Loss by incorporating priority as described in the referenced paper. The buffer itself performs uniform sampling (ReplayBuffer). Additionally, I implemented a PrioritizedReplayBuffer to initialize the parameters alpha and beta (following the PAL or PER papers) and to properly handle the case where the PAL Loss is applied within the training method.

Motivation and Context

I have raised an issue to propose this change (required for new features and bug fixes)
In accordance with @AlexPasqua PR Prioritized experience replay #1622 (and the corresponding issue Prioritized Experience Replay for DQN #1242) (👋 @araffin )

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)

Checklist

Note: You can run most of the checks using make commit-checks.

Note: we are using a maximum length of 127 characters per line

bilelsgh · 2025-08-05T14:11:09Z

It was tested for cartpole

The code is running properly. There is no significant improvement in either the loss or the reward. As detailed in the PER original paper, PER does not always lead to better performance, particularly in environments with low variance in TD-errors and a limited number of rare or informative transitions.

Feel free to evaluate the PR directly, or refer to the experiments presented in the paper used as the basis for this implementation.

Doctring imrovement

nf2bi89 · 2025-10-03T00:55:36Z

Could you provide a working example for testing that you used? When passing the buffer via the buffer class argument it appears to ignore it and thus not use the newly implemented code?

bilelsgh added 2 commits August 5, 2025 15:24

Add Prioritized Approximation loss feature

502d7e0

Add changelog

53a7b2d

bilelsgh added 4 commits August 5, 2025 16:53

Update buffers.py

2fa2d87

Doctring imrovement

Add alpha and beta parameters

dc99748

Merge branch 'master' into per_loss

447d01f

alpha and beta attributes in PrioritizedReplayBuffer

d7c1bfa

bilelsgh mentioned this pull request Aug 5, 2025

Prioritized experience replay #1622

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Prioritized Approximation Loss feature #2166

Add Prioritized Approximation Loss feature #2166

Uh oh!

bilelsgh commented Aug 5, 2025 •

edited

Loading

Uh oh!

bilelsgh commented Aug 5, 2025 •

edited

Loading

Uh oh!

nf2bi89 commented Oct 3, 2025

Uh oh!

Uh oh!

Add Prioritized Approximation Loss feature #2166

Are you sure you want to change the base?

Add Prioritized Approximation Loss feature #2166

Uh oh!

Conversation

bilelsgh commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Feature overview

Description

Motivation and Context

Types of changes

Checklist

Uh oh!

bilelsgh commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nf2bi89 commented Oct 3, 2025

Uh oh!

Uh oh!

bilelsgh commented Aug 5, 2025 •

edited

Loading

bilelsgh commented Aug 5, 2025 •

edited

Loading