Skip to content

Losses to implement and losses naming conventions #38

@albertz

Description

@albertz

Some modules we should implement:

  • CrossEntropy or CE. Should this cover both dense and sparse targets, or do we want a separate module for the sparse case, like SparseCrossEntropy or so? Should this potentially also allow for logits? log-probs?
  • KL or KullbackLeiblerDivergence
  • BinaryCrossEntropy or BCE
  • L2Dist (absolute or mean?) Or MSE or MeanSquarredError? (The mean reduction is over the feature axis. Not over time or batch.)
  • L1Dist (absolute or mean?) Or MeanL1Dist?
  • Ctc or CTC or CtcLogProb
  • CosineSimilarity

I don't like the naming of the PyTorch losses too much here.
They have the postfix Loss on all of them, although these modules are generic and not necessarily just for loss computation (although that's probably their most common usage).
Also CrossEntropyLoss is actually log-softmax + CE together. So very much like the TF tf.nn.sparse_softmax_cross_entropy_with_logits.
And there is a separate NLLLoss. Which is just like CrossEntropyLoss but it doesn't take logits but log-prob instead. I find this naming to be confusing.

Also the question is how we should handle things like label smoothing. On RETURNN side (and also in TF), it is just an option to the CE-loss. On PyTorch side, it is not implemented yet as part of the official PyTorch API. Some background here. It was only very recently added (pytorch/pytorch#7455, pytorch/pytorch#63122). This also adds it as an option label_smoothing to CrossEntropyLoss. An alternative would be that the user makes this more explicit, like:

target_prob_smooth = smooth_one_hot(targets, label_prob=0.9)
loss = cross_entropy(target_prob_smooth, out_prob)

Although label smoothing has become very common, so maybe it makes sense to have this also just as an option.

Note also that the loss accumulation over the dataset and handling of calculating the correct average (mean) is handled by RETURNN. All such losses would just yield a vector of shape [B] or [B,T].

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions