Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dqn word choice #257

Merged
merged 2 commits into from
Aug 25, 2022
Merged

Refactor dqn word choice #257

merged 2 commits into from
Aug 25, 2022

Conversation

vwxyzjn
Copy link
Owner

@vwxyzjn vwxyzjn commented Aug 12, 2022

Description

This PR fixes a confusing word choice in our DQN implementation. Previously we had

logits = q_network.apply(q_state.params, obs)

However, the word logits usually refer to "unnormalized probabilities" (see What is the meaning of the word logits in TensorFlow?). A more correct word of choice here is q_values = q_network.apply(q_state.params, obs).

Given this is a non-performance-impacting refactor, re-running the benchmark is not required.

CC @santiontanon

Types of changes

  • Bug fix
  • New feature
  • New algorithm
  • Documentation

@vercel
Copy link

vercel bot commented Aug 12, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
cleanrl ✅ Ready (Inspect) Visit Preview Aug 12, 2022 at 9:42PM (UTC)

@vwxyzjn vwxyzjn requested review from dipamc and yooceii August 12, 2022 21:42
@vwxyzjn
Copy link
Owner Author

vwxyzjn commented Aug 24, 2022

@yooceii could you approve this please?

@vwxyzjn vwxyzjn merged commit ede2012 into master Aug 25, 2022
@vwxyzjn vwxyzjn mentioned this pull request Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants