Replies: 1 comment 2 replies
-
We tried to replace it in few architectures (Deit and Swin) and observations were consistent with the ones reported in the paper. However, separable attention needs to be carefully integrated especially if the models have relative positional biases. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
I was wondering if you have replaced normal attention with this separable self attention layer to other transformer networks like DeIT as well? What were your observations in it?
Beta Was this translation helpful? Give feedback.
All reactions