generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Closed
Labels
Description
System Info
newest trl version
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder - My own task or dataset (give details below)
Reproduction
mean_entropy = (-logprobs).sum(1).mean()
This line is wrong, the -logprobs is obtained by:
logprobs = torch.masked_fill(logprobs, padding_mask, INVALID_LOGPROB)
ref_logprobs = torch.masked_fill(ref_logprobs, padding_mask, INVALID_LOGPROB)
where INVALID_LOGPROB = 1.0
This will cause objective/entropy < 0 on long padded seqs
Expected behavior
The entropy calculation should be right