Fixing the numerical instability when calculating the loss of the cri… (

#501) * Fixing the numerical instability when calculating the loss of the critic model * formatting * formatting --------- Co-authored-by: Michael Wyatt <michaelwyatt@microsoft.com> Co-authored-by: Michael Wyatt <mrwyattii@gmail.com>
microsoft · May 9, 2023 · 8f8099a · 8f8099a
1 parent 6779302
commit 8f8099a
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/applications/DeepSpeed-Chat/training/utils/model/reward_model.py b/applications/DeepSpeed-Chat/training/utils/model/reward_model.py
@@ -99,8 +99,8 @@ def forward(self,
                 chosen_reward[c_ind - 1])  #use the end score for reference
             rejected_mean_scores.append(rejected_reward[r_ind - 1])
 
-            loss += -torch.log(
-                torch.sigmoid(c_truncated_reward - r_truncated_reward)).mean()
+            loss += -torch.nn.functional.logsigmoid(c_truncated_reward -
+                                                    r_truncated_reward).mean()
 
         loss = loss / bs
         chosen_mean_scores = torch.stack(chosen_mean_scores)