Open
Description
Hi,
I stumbled across an issue where my losses all went to NaN
using binary_cross_entropy_with_logit
. I quickly found out that this happens for high logit values. When sigmoid gets applied to them the result will be a 1.0, applying an affine transformation of
-1 * y + 1
then results in 0.0
and finally taking the log of 0.0
yields NaN
.
Line 66 in 3d1dc06
I looked up in the pytorch source how they did it and assembled a more or less identical way in candle (see PR).
I haven't actually looked up the mathematical definition of the operation or dug deep into examples to see if the proposed implementation is flawed as well, so take the PR with a grain of salt ;)
Metadata
Metadata
Assignees
Labels
No labels
Activity