Skip to content

attention mask #4

@wj210

Description

@wj210

Hi, i notice that the inputs are padded but no attention mask is given to the encoder, is this done on purpose?
Also in the computation of reinforce
if args.sample_aggregation == "max": loss = (a_reward - s_reward) * sample_probs.sum(1).mean() else: loss = 0. for sample_probs_i, s_rewards_i in zip(sample_data["probs"], sample_data["rewards"]): s_reward_i = np.mean(s_rewards_i) loss_i = (a_reward_i - s_reward_i) * sample_probs_i.sum(1).mean() loss += loss_i loss /= len(sample_data["rewards"])

shouldn't the loss be ((a_reward - s_reward )*sample_probs).sum(1).mean()?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions