-
Notifications
You must be signed in to change notification settings - Fork 6.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Flip sign of A2C, IMPALA entropy coefficient; raise DeprecationWarning if negative #4374
Conversation
Upon further thought, we can actually safely flip the sign, by raising DeprecationWarning if it's negative. This is probably the better long-term solution, but should be reviewed carefully. |
Test PASSed. |
Test PASSed. |
Test FAILed. |
Test PASSed. |
Test FAILed. |
jenkins retest this please |
Test PASSed. |
Test FAILed. |
jenkins retest this please |
Test PASSed. |
Test FAILed. |
…ow all are positive. For reasoning, see ray-project/ray#4374
What do these changes do?
Throw an error if we end up penalizing entropy, which is probably unintended. Other options here:
Related issue number
Closes #4369