-
Notifications
You must be signed in to change notification settings - Fork 4
Closed
Milestone
Description
I just lost a couple of hours on debugging because I forgot that softmax_over_spatial, which is what nn.softmaxmaps to, does something completely different than the old "softmax" layer. Because it does not do the softmax per default over "F", but something different (defaulting to the time axis). This is really dangerous when you expect that you can use nn.softmax as an activation function.
I am not sure how to solve this best, I would say"softmax_over_spatial" is okay in its behavior (so no RETURNN changes), but nn.softmax should definitely not default to that behavior.
Maybe this issue is already resolved if nn.softmax needs an explicit dimension tag in the future, but if not, it needs to be fixed.
Metadata
Metadata
Assignees
Labels
No labels