Separable Self Attention to other models #48

parakh08 · 2022-09-13T07:39:17Z

parakh08
Sep 13, 2022

Hi,

I was wondering if you have replaced normal attention with this separable self attention layer to other transformer networks like DeIT as well? What were your observations in it?

sacmehta · 2022-09-13T16:08:29Z

sacmehta
Sep 13, 2022
Maintainer

We tried to replace it in few architectures (Deit and Swin) and observations were consistent with the ones reported in the paper. However, separable attention needs to be carefully integrated especially if the models have relative positional biases.

2 replies

parakh08 Sep 15, 2022
Author

Yeah, that was what I was wondering. I tried to integrate it in Deit for the classification task, however, the accuracy that I am obtaining is quite off the original Deit accuracy. Will it be possible for you to share the models?

This is how the separable attention looks in my current distilled model for Deit, 224 image size and 16 patch.

parakh08 Sep 19, 2022
Author

@sacmehta can you please help me in this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Separable Self Attention to other models #48

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Separable Self Attention to other models #48

parakh08 Sep 13, 2022

Replies: 1 comment · 2 replies

sacmehta Sep 13, 2022 Maintainer

parakh08 Sep 15, 2022 Author

parakh08 Sep 19, 2022 Author

parakh08
Sep 13, 2022

Replies: 1 comment 2 replies

sacmehta
Sep 13, 2022
Maintainer

parakh08 Sep 15, 2022
Author

parakh08 Sep 19, 2022
Author