Use RandomizeAction wrapper instead of Explorer in evaluation #328

muupan · 2018-10-09T17:42:52Z

We currently use Explorer to inject randomness to action selection in evaluation, which is common in Atari benchmarks. This use is actually irrelevant to exploration in training, so is a misnomer in my opinion.

I think we better use an env wrapper for this purpose, so that the training and evaluation code can be simpler. This is also important in #326 , where I need to implement another set of code for training and evaluation.

toslunar · 2018-10-10T04:31:08Z

Resolved the conflict.

prabhatnagarajan · 2018-10-10T01:45:52Z

examples/ale/train_nsq_ale.py

@@ -87,6 +88,9 @@ def make_env(process_idx, test):
            episode_life=not test,
            clip_rewards=not test)
        env.seed(int(env_seed))
+        if test:
+            # Randomize actions like epsilon-greedy in evaluation as well
+            env = chainerrl.wrappers.RandomizeAction(env, 0.05)


Is there a reason that you use 0.05 instead of args.eval_epsilon as in the other algorithms? I'm aware that the DQN paper uses 0.05, but then why not use a raw value for the other domains?

I did so just because the previous examples/ale/train_nsq_ale.py did so. I agree that it's better to allow configuring it, but since this is not relevant to this PR, I kept it unchanged.

prabhatnagarajan

LGTM; Perhaps we should create an issue or a new PR to make the 0.05 in NSQ parameterizable.

muupan added 8 commits October 10, 2018 01:20

Add RandomizeAction to replace eval_explorer

5d3094d

Fix RandomizeAction

e3631e9

Simplify

0465af5

Add tests

de72baa

Use RandomAction in examples

1034e44

Remove explorer used in evaluation

a124599

Improve docstring

090bdbf

Add a message for assert

74cc2a1

muupan mentioned this pull request Oct 9, 2018

Batch A2C/PPO/DQN #326

Merged

5 tasks

Fix errors

2d3c62f

muupan changed the title ~~Use RandomAction wrapper instead of Explorer in evaluation~~ Use RandomizeAction wrapper instead of Explorer in evaluation Oct 9, 2018

prabhatnagarajan self-assigned this Oct 10, 2018

Merge branch 'master' into randomize-action

154a4e4

prabhatnagarajan reviewed Oct 10, 2018

View reviewed changes

prabhatnagarajan approved these changes Oct 10, 2018

View reviewed changes

muupan merged commit f1ac3b6 into chainer:master Oct 10, 2018

muupan modified the milestone: v0.5 Nov 13, 2018

muupan added enhancement no-compat labels Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use RandomizeAction wrapper instead of Explorer in evaluation #328

Use RandomizeAction wrapper instead of Explorer in evaluation #328

muupan commented Oct 9, 2018 •

edited by prabhatnagarajan

Loading

toslunar commented Oct 10, 2018

prabhatnagarajan Oct 10, 2018

muupan Oct 10, 2018

prabhatnagarajan left a comment

Use RandomizeAction wrapper instead of Explorer in evaluation #328

Use RandomizeAction wrapper instead of Explorer in evaluation #328

Conversation

muupan commented Oct 9, 2018 • edited by prabhatnagarajan Loading

toslunar commented Oct 10, 2018

prabhatnagarajan Oct 10, 2018

Choose a reason for hiding this comment

muupan Oct 10, 2018

Choose a reason for hiding this comment

prabhatnagarajan left a comment

Choose a reason for hiding this comment

muupan commented Oct 9, 2018 •

edited by prabhatnagarajan

Loading