Skip to content

Commit

Permalink
Fixing the numerical instability when calculating the loss of the cri… (
Browse files Browse the repository at this point in the history
#501)

* Fixing the numerical instability when calculating the loss of the critic model

* formatting

* formatting

---------

Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com>
Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
  • Loading branch information
3 people authored May 9, 2023
1 parent 6779302 commit 8f8099a
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ def forward(self,
chosen_reward[c_ind - 1]) #use the end score for reference
rejected_mean_scores.append(rejected_reward[r_ind - 1])

loss += -torch.log(
torch.sigmoid(c_truncated_reward - r_truncated_reward)).mean()
loss += -torch.nn.functional.logsigmoid(c_truncated_reward -
r_truncated_reward).mean()

loss = loss / bs
chosen_mean_scores = torch.stack(chosen_mean_scores)
Expand Down

0 comments on commit 8f8099a

Please sign in to comment.