Ring Attention Pytorch implementation of Ring Attention and Striped Ring Attention. Ring attention is a distributed attention operator that compute attention across devices. Usage ./run_single.sh