Refactor dqn word choice #257

vwxyzjn · 2022-08-12T21:42:26Z

Description

This PR fixes a confusing word choice in our DQN implementation. Previously we had

logits = q_network.apply(q_state.params, obs)

However, the word logits usually refer to "unnormalized probabilities" (see What is the meaning of the word logits in TensorFlow?). A more correct word of choice here is q_values = q_network.apply(q_state.params, obs).

Given this is a non-performance-impacting refactor, re-running the benchmark is not required.

CC @santiontanon

Types of changes

Bug fix
New feature
New algorithm
Documentation

vercel · 2022-08-12T21:42:30Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated
cleanrl	✅ Ready (Inspect)	Visit Preview	Aug 12, 2022 at 9:42PM (UTC)

vwxyzjn · 2022-08-24T21:14:03Z

@yooceii could you approve this please?

vwxyzjn added 2 commits August 12, 2022 17:38

Refactor dqn word choice

9314057

refactor

dfb1430

vwxyzjn requested review from dipamc and yooceii August 12, 2022 21:42

yooceii approved these changes Aug 25, 2022

View reviewed changes

vwxyzjn merged commit ede2012 into master Aug 25, 2022

vwxyzjn mentioned this pull request Oct 19, 2022

RLops Guide #296

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor dqn word choice #257

Refactor dqn word choice #257

vwxyzjn commented Aug 12, 2022 •

edited

Loading

vercel bot commented Aug 12, 2022

vwxyzjn commented Aug 24, 2022

Refactor dqn word choice #257

Refactor dqn word choice #257

Conversation

vwxyzjn commented Aug 12, 2022 • edited Loading

Description

Types of changes

vercel bot commented Aug 12, 2022

vwxyzjn commented Aug 24, 2022

vwxyzjn commented Aug 12, 2022 •

edited

Loading