Is It Necessary to Transpose Dimensions in Multi-Head Attention or Can We Reshape Directly? #399
-
Is it necessary the implementation should be like this
over casting them directly to the dimension that we want?
Got the code from the book as well as from here |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hey there, this is a really good question. At first glance, it looks like this should work because the dimensions would be the same. But note that reshaping and transposing are slightly different in terms of how the matrices get arranged for the matrix multiplication that follows. So, no, those are not interchangeable. Actually, there was the same question in #167, where the answer may give a bit more concrete insights. Anyways, thanks for asking! |
Beta Was this translation helpful? Give feedback.
-
Tks a lot for ur answer. Will refer there. |
Beta Was this translation helpful? Give feedback.
Hey there,
this is a really good question. At first glance, it looks like this should work because the dimensions would be the same. But note that reshaping and transposing are slightly different in terms of how the matrices get arranged for the matrix multiplication that follows. So, no, those are not interchangeable. Actually, there was the same question in #167, where the answer may give a bit more concrete insights.
Anyways, thanks for asking!