Skip to content

Cannot reproduce Breakout benchmark using Double DQN #176

Open
@gbg141

Description

@gbg141

I haven't been able to reproduce the results of the Breakout benchmark with Double DQN when using similar hyperparameter values than the ones presented in the original paper. After more than 20M observed frames (~100.000 episodes), the mean 100 episode reward still remains around 10, having achieved a maximum value of 12.

I present in the following list the neural network configuration as well as the hyperparameter values that I'm using in case I'm missing or getting something important wrong:

env = gym.make("BreakoutNoFrameskip-v4")
env = ScaledFloatFrame(wrap_dqn(env))
model = deepq.models.cnn_to_mlp(
        convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
        hiddens=[512],
        dueling=False
)
act = deepq.learn(
        env,
        q_func=model,
        lr=25e-5,
        max_timesteps=200000000,
        buffer_size=100000, #cannot store 1M frames as the paper suggests
        exploration_fraction=1000000/float(200000000), #so as to finish after !M steps
        exploration_final_eps=0.1,
        train_freq=4,
        batch_size=32,
        learning_starts=50000,
        target_network_update_freq=10000,
        gamma=0.99,
        prioritized_replay=False
)

Does anyone have some idea of what is going wrong? The analogous results exposed in a jupyter notebook in openai/baselines-results indicate that I should be able to get much better scores.

Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions