Anisotropic Fourier Feature Positional Encoding (AFPE) is a generalization of Isotropic Fourier Feature Positional Encoding. Unlike isotropic encodings, AFPE captures anisotropic, class-specific, and domain-aware spatial dependencies, making it well-suited for complex medical and scientific imaging tasks. AFPE introduces one intuitive hyperparameter per spatial or temporal dimension, allowing fine-grained control over how positional information is encoded.
👉 Get a better feel for how AFPE works in practice by exploring our interactive visualization!
AFPE captures directional spatial dependencies more effectively than Sinusoidal Positional Encoding (SPE) and Isotropic Fourier Feature Positional Encoding (IFPE). Shown are dot product similarity maps on a Vision Transformer-style patch grid, with similarities computed relative to the central patch.
To run the code in this repo create a conda environment and install the dependencies using the following commands:
conda env create -f environment.yml # Creates conda environment named 'posenc'
conda activate posenc # Activates the conda posenc environment
python setup.py install # Installs this packageImportant: Berfore you start, make sure ./results/chestx_predictions.7z is unpacked and you have the ./results/chestx_predictions.csv file.
-
Minimal example
For a minimal example to generate the results in the table above, run:bash python ./generate_results.pyThis script only requires torch, numpy, pandas and torchmetrics (1.5.2) to be installed! -
More results exploration
For more in depth exploration of the results and plots in the paper, follow the notebooks in thenotebooksfolder:
Implementation of proposed method
This publication introduces Anisotropic Fourier Feature Positional Encodings (AFPE). The implementation of this positional encoding as well as all other positional encodings can be found in the positional_encodings.py file.
Pretrained models can be downloaded from https://huggingface.co/afpe/afpe. To load a pretrained model use the following code:
# Loading model for EchoNet-Dynamic regression task
from posenc.modules.video_regression import VideoViTModule
model = VideoViTModule.load_from_checkpoint(model_path, map_location=torch.device(DEVICE))
# Loading model for ChestX multi-label classification task
from posenc.modules.vision_transformer import ViTMultiClsModule
model = ViTMultiClsModule.load_from_checkpoint(model_path, map_location=torch.device(DEVICE))To train a model like in the this publication you need to:
-
Download dataset The datasets used in this study are publicly available and can be accessed through the following links:
- NIH Chest X-ray: https://cloud.google.com/healthcare-api/docs/resources/public-datasets/nih-chest
- MedMNIST: https://medmnist.com/
- EchoNet-Dynamic: https://echonet.github.io/dynamic/index.html
Important: After downloading the datasets change the ROOT variable in posenc/datasets/chestx.py and posenc/datasets/echonet.py to the path where the datasets are stored!
-
Preprocess the dataset ChestX preprocessing includes (see script):
- Resizing the images to 224x224
- Normalizing the images
EchoNet preprocessing includes (see script):
- Grayscale video
- Normalizing the videos
-
Training
-
To train any of the models in the paper, you can run the train.py script. Use
posenc/train.py --helpto view all options. -
Example to run the EchoNet-Dynamic regression task with the AFPE:
bash python posenc/train.py --task echonetreg --positional_encoding isofpe -
Example to run the ChestX multi-label classification task with the AFPE:
bash python posenc/train.py --task chestxmulti --positional_encoding isofpe
-