Open
Description
Hi, thanks for your work!
I have a question about the SEAN module implementation detail. In the de-normalization process, why did you add an extra 1 to the gamma? i.e., out = normalized * (1 + gamma_final) + beta_final
.
Why 1+gamma here? According to Eq. (1) of the paper, it is gramma exactly.
Metadata
Metadata
Assignees
Labels
No labels