A bug in the implementation

Hello, I spotted what I believe might be a bug in the DQN implementation on line 291 here:

https://github.com/devsisters/DQN-tensorflow/blob/master/dqn/agent.py#L291

The code tries to clip the `self.delta` with `tf.clip_by_value`, I assume with the intention of being robust when the discrepancy in Q is above a threshold:

```python
self.delta = self.target_q_t - q_acted
self.clipped_delta = tf.clip_by_value(self.delta, self.min_delta, self.max_delta, name='clipped_delta')
self.global_step = tf.Variable(0, trainable=False)
self.loss = tf.reduce_mean(tf.square(self.clipped_delta), name='loss')
```

However, the `clip_by_value` function's local gradient outside of the `min_delta, max_delta` range is zero. Therefore, with the current code whenever the discrepancy is above min/max delta, the gradient becomes exactly zero in backprop. This might not be what you intend, and is certainly not standard, I believe.

I think you probably want to clip the **gradient** here, not the raw **Q**. In that case you would have to use the Huber loss:

```
def clipped_error(x): 
    return tf.select(tf.abs(x) < 1.0, 0.5 * tf.square(x), tf.abs(x) - 0.5) # condition, true, false
```

and use this on `this.delta` instead of `tf.square`. This would have the desired effect of increased robustness to outliers. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A bug in the implementation #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

A bug in the implementation #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions