Description
Hello, I spotted what I believe might be a bug in the DQN implementation on line 291 here:
https://github.com/devsisters/DQN-tensorflow/blob/master/dqn/agent.py#L291
The code tries to clip the self.delta
with tf.clip_by_value
, I assume with the intention of being robust when the discrepancy in Q is above a threshold:
self.delta = self.target_q_t - q_acted
self.clipped_delta = tf.clip_by_value(self.delta, self.min_delta, self.max_delta, name='clipped_delta')
self.global_step = tf.Variable(0, trainable=False)
self.loss = tf.reduce_mean(tf.square(self.clipped_delta), name='loss')
However, the clip_by_value
function's local gradient outside of the min_delta, max_delta
range is zero. Therefore, with the current code whenever the discrepancy is above min/max delta, the gradient becomes exactly zero in backprop. This might not be what you intend, and is certainly not standard, I believe.
I think you probably want to clip the gradient here, not the raw Q. In that case you would have to use the Huber loss:
def clipped_error(x):
return tf.select(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5) # condition, true, false
and use this on this.delta
instead of tf.square
. This would have the desired effect of increased robustness to outliers.