Open
Description
Hi @lespeholt, thank you for this great work.
I have some questions about Atari evaluation protocol as follows:
- Did you use separate evaluation episodes for testing? If you did, how many evaluation episodes were used?
- How many random seeds are used and averaged for evaluation?
- Did you use sticky action for training or testing?
- Did you use life-loss heuristic for training or testing?
- At the evaluation phase, did you use mode of the policy or sample of the policy?
- What is the standard evaluation protocol for Atari 57?
Metadata
Metadata
Assignees
Labels
No labels