Description
Hi there,
I'm looking at the Optimistic Asymmetric Clipping as mentioned in the IMPALA paper, where a clip function f(r) = 0.3 * min(tanh(r), 0) + 5.0 * max(tanh(r), 0) is used. However, I found it is different from the code implementation (https://github.com/deepmind/scalable_agent/blob/master/experiment.py#L367). In the code above, the clip function is f(r) = 5 * 0.3 * min(tanh(r / 5), 0) + 5 * max(tanh(r / 5), 0), which is obviously not consistent with that in the paper.
I plot these two clip functions. Sadly, it seems that both of them doesn't match Figure D.1. in the paper.
Could you please tell which clip function was used in the experiments? And also, any explanations about the discrepancy will be helpful!
Thanks for your reading and I look forward to hearing from you.