question about visualizing attention


First of all, thank you for your valuable research.

I'm writing to seek clarification regarding the attention map generation process described in your paper. Specifically, in Apendix C.10 you explain the methodology for obtaining attention maps as follows:

<img width="927" height="672" alt="Image" src="https://github.com/user-attachments/assets/c81de224-3199-40b5-9f7b-b50ece7c4b7c" />

Based on this description, I would expect the resulting attention maps to be relatively coarse heatmaps ($\sqrt{n} \times \sqrt{n}$). However, the visualizations presented in Figure 20 appear remarkably fine-grained:

<img width="1082" height="863" alt="Image" src="https://github.com/user-attachments/assets/93e5f28f-c4f7-44c9-8c41-9fd5089edc19" />

I suspect I may be missing some implementation detail or misunderstanding part of the methodology. Could you please clarify how you transition from the described extraction process to the high-resolution attention maps shown in your results?

Or even if you open-sourced the code used to generate these visualizations, it would be extremely helpful for the community to better understand your implementation.

Thank you for your time and for sharing your work with the community. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question about visualizing attention #32

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

question about visualizing attention #32

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions