Implementation of Flash Attention in Jax
-
Updated
Mar 1, 2024 - Python
Implementation of Flash Attention in Jax
Implementation of Block Recurrent Transformer - Pytorch
[ICLR 2025] TidalDecode: A Fast and Accurate LLM Decoding with Position Persistent Sparse Attention
RAN: Recurrent Attention Networks for Long-text Modeling | Findings of ACL23
Official code for the NeurIPS25 paper "RAT: Bridging RNN Efficiencyand Attention Accuracy in Language Modeling" (https://arxiv.org/abs/2507.04416))
A custom attention framework aimed at maximum context, speed and usability. Featured with a triton kernel, and a couple of benchmarks.
A repository to get train transformers to access longer context for causal language models, most of these methods are still in testing. Try them out if you'd like but please lmk your results so we don't duplicate work :)
Add a description, image, and links to the long-context-attention topic page so that developers can more easily learn about it.
To associate your repository with the long-context-attention topic, visit your repo's landing page and select "manage topics."