Open
Description
I implemented sub-quadratic attention (as described in https://arxiv.org/abs/2112.05682v2):
https://twitter.com/Birchlabs/status/1607503573906063362
Birch-san#1
Birch-san/diffusers-play@a573e3d
is this worth upstreaming? it enables creation of images larger than can be achieved with attention slicing.