Cannot reproduce Breakout benchmark using Double DQN

I haven't been able to reproduce the results of the Breakout benchmark with Double DQN when using similar hyperparameter values than the ones presented in the original paper. After more than 20M observed frames (~100.000 episodes), the mean 100 episode reward still remains around 10, having achieved a maximum value of 12. 

I present in the following list the neural network configuration as well as the hyperparameter values that I'm using in case I'm missing or getting something important wrong:
```
env = gym.make("BreakoutNoFrameskip-v4")
env = ScaledFloatFrame(wrap_dqn(env))
model = deepq.models.cnn_to_mlp(
        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
        hiddens=[512],
        dueling=False
)
act = deepq.learn(
        env,
        q_func=model,
        lr=25e-5,
        max_timesteps=200000000,
        buffer_size=100000, #cannot store 1M frames as the paper suggests
        exploration_fraction=1000000/float(200000000), #so as to finish after !M steps
        exploration_final_eps=0.1,
        train_freq=4,
        batch_size=32,
        learning_starts=50000,
        target_network_update_freq=10000,
        gamma=0.99,
        prioritized_replay=False
)
```

Does anyone have some idea of what is going wrong? The analogous results exposed in a [jupyter notebook in `openai/baselines-results`](https://github.com/openai/baselines-results/blob/master/dqn_results.ipynb) indicate that I should be able to get much better scores.

Thanks in advance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot reproduce Breakout benchmark using Double DQN #176

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cannot reproduce Breakout benchmark using Double DQN #176

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions