Open
Description
I haven't been able to reproduce the results of the Breakout benchmark with Double DQN when using similar hyperparameter values than the ones presented in the original paper. After more than 20M observed frames (~100.000 episodes), the mean 100 episode reward still remains around 10, having achieved a maximum value of 12.
I present in the following list the neural network configuration as well as the hyperparameter values that I'm using in case I'm missing or getting something important wrong:
env = gym.make("BreakoutNoFrameskip-v4")
env = ScaledFloatFrame(wrap_dqn(env))
model = deepq.models.cnn_to_mlp(
convs=[(32, 8, 4), (64, 4, 2), (64, 3, 1)],
hiddens=[512],
dueling=False
)
act = deepq.learn(
env,
q_func=model,
lr=25e-5,
max_timesteps=200000000,
buffer_size=100000, #cannot store 1M frames as the paper suggests
exploration_fraction=1000000/float(200000000), #so as to finish after !M steps
exploration_final_eps=0.1,
train_freq=4,
batch_size=32,
learning_starts=50000,
target_network_update_freq=10000,
gamma=0.99,
prioritized_replay=False
)
Does anyone have some idea of what is going wrong? The analogous results exposed in a jupyter notebook in openai/baselines-results
indicate that I should be able to get much better scores.
Thanks in advance.
Metadata
Metadata
Assignees
Labels
No labels