Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RandomizeAction wrapper instead of Explorer in evaluation #328

Merged
merged 10 commits into from
Oct 10, 2018

Conversation

muupan
Copy link
Member

@muupan muupan commented Oct 9, 2018

We currently use Explorer to inject randomness to action selection in evaluation, which is common in Atari benchmarks. This use is actually irrelevant to exploration in training, so is a misnomer in my opinion.

I think we better use an env wrapper for this purpose, so that the training and evaluation code can be simpler. This is also important in #326 , where I need to implement another set of code for training and evaluation.

@muupan muupan mentioned this pull request Oct 9, 2018
5 tasks
@muupan muupan changed the title Use RandomAction wrapper instead of Explorer in evaluation Use RandomizeAction wrapper instead of Explorer in evaluation Oct 9, 2018
@prabhatnagarajan prabhatnagarajan self-assigned this Oct 10, 2018
@toslunar
Copy link
Member

Resolved the conflict.

@@ -87,6 +88,9 @@ def make_env(process_idx, test):
episode_life=not test,
clip_rewards=not test)
env.seed(int(env_seed))
if test:
# Randomize actions like epsilon-greedy in evaluation as well
env = chainerrl.wrappers.RandomizeAction(env, 0.05)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason that you use 0.05 instead of args.eval_epsilon as in the other algorithms? I'm aware that the DQN paper uses 0.05, but then why not use a raw value for the other domains?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did so just because the previous examples/ale/train_nsq_ale.py did so. I agree that it's better to allow configuring it, but since this is not relevant to this PR, I kept it unchanged.

Copy link
Contributor

@prabhatnagarajan prabhatnagarajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; Perhaps we should create an issue or a new PR to make the 0.05 in NSQ parameterizable.

@muupan muupan merged commit f1ac3b6 into chainer:master Oct 10, 2018
@muupan muupan modified the milestone: v0.5 Nov 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants