-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi, i notice that the inputs are padded but no attention mask is given to the encoder, is this done on purpose?
Also in the computation of reinforce
if args.sample_aggregation == "max": loss = (a_reward - s_reward) * sample_probs.sum(1).mean() else: loss = 0. for sample_probs_i, s_rewards_i in zip(sample_data["probs"], sample_data["rewards"]): s_reward_i = np.mean(s_rewards_i) loss_i = (a_reward_i - s_reward_i) * sample_probs_i.sum(1).mean() loss += loss_i loss /= len(sample_data["rewards"])
shouldn't the loss be ((a_reward - s_reward )*sample_probs).sum(1).mean()?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels