This repository contain implementations of Attention Rollout and Attention Flow algorithms, which are post hoc methods to get more explanatory attention weights.
Attention Rollout and Attention Flow recursively compute the token attentions in each layer of a given model given the embedding attentions as input. They differ in the assumptions they make about how attention weights in lower layers affect the flow of information to the higher layers and whether to compute the token attentions relative to each other or independently.