A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

Paper: A Spitting Image: Modular Superpixel Tokenization in Vision Transformers (Link)

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

Marius Aasan, Odd Kolbjørnsen, Anne Schistad Solberg, Adín Ramírez Rivera

SPiT: Superpixel Transformers

This repo contains code and weights for A Spitting Image: Modular Superpixel Tokenization in Vision Transformers, accepted for MELEX, ECCVW 2024.

For an introduction to our work, visit the project webpage.

Installation

The package can currently be installed via:

# HTTPS
pip install git+https://github.com/dsb-ifi/SPiT.git

# SSH
pip install git+ssh://git@github.com/dsb-ifi/SPiT.git

Loading models

You can load the Superpixel Transformer model easily via torch.hub:

model = torch.hub.load(
    'dsb-ifi/spit', 
    'spit_base_16',
    pretrained=True,
    source='github',
)

This will load the model and downloaded the pretrained weights, stored in your local torch.hub directory. If you prefer downloading weights manually, feel free to use:

Model	Link	MD5
SPiT-S16	Manual Download	8e899c846a75c51e1c18538db92efddf
SPiT-S16 (w. grad.)	Manual Download	e49be7009c639c0ccda4bd68ed34e5af
SPiT-B16	Manual Download	9d3483a4c6fdaf603ee6528824d48803
SPiT-B16 (w. grad.)	Manual Download	9394072a5d488977b1af05c02aa0d13c
ViT-S16	Manual Download	73af132e4bb1405b510a5eb2ea74cf22
ViT-S16 (w. grad.)	Manual Download	b8e4f1f219c3baef47fc465eaef9e0d4
ViT-B16	Manual Download	ce45dcbec70d61d1c9f944e1899247f1
ViT-B16 (w. grad.)	Manual Download	1caa683ecd885347208b0db58118bf40
RViT-B16	Manual Download	18c13af67d10f407c3321eb1ca5eb568
RViT-B16 (w. grad.)	Manual Download	50d25403adfd5a12d7cb07f7ebfced97

More Examples

We provide a Jupyter notebook as a sandbox for loading, evaluating, and extracting segmentations for the models.

Notes:

RViT and On-Line Voronoi Tesselation

Currently the code features some slight modifications to streamline use of the RViT models. The original RViT models sampled partitions from a dataset of pre-computed Voronoi tesselations for training and evaluation. This is impractical for deployment, and we have yet to implement a CUDA kernel for computing Voronoi with lower memory overhead.

However, we have developed a fast implementation for generating fast tesselations with PCA trees [1], which mimic Voronoi tesselations relatively well, and can be computed on-the-fly. There are, however still some minor issues with the small capacity RViT models. Consequently, the RViT-B16 models will perform marginally different than the reported results in the paper. We appreciate the readers patience with regard to this matter.

Note that the RViT models are inherently stochastic so that different runs can yield different results. Also, it is worth mentioning that SPiT models can yield slightly different results for each run, due to nondeterministic behaviours in CUDA kernels.

[1] Refinements to nearest-neighbor searching in $k$-dimensional trees (Sproull, 1991)

Progress and Current Todo's:

Include foundational code and model weights.
Add manual links with MD5 hash for manual weight download.
Add module for loading models, and provide example notebook.
Create temporary solution to on-line Voronoi tesselation.
Add hubconf.py for PyTorch Hub compatability.
Add example for extracting attribution maps with Att.Flow and Proto.PCA.
Add example for computing sufficiency and comprehensiveness.
Add assets for computed attribution maps for XAI experiments.
Add code and examples for salient segmentation.

Citation

If you find our work useful, please consider citing our paper.

@inproceedings{Aasan2024,
  title={A Spitting Image: Modular Superpixel Tokenization in Vision Transformers},
  author={Aasan, Marius and Kolbj\o{}rnsen, Odd and Schistad Solberg, Anne and Ram\'irez Rivera, Ad\'in},
  booktitle={{CVF/ECCV} Computer Vision -- {ECCVW} 2024 -- {MELEX}},
  year={2024}
  doi="https://doi.org/10.1007/978-3-031-93806-1_11",
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
assets		assets
notebooks		notebooks
spit		spit
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
hubconf.py		hubconf.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

SPiT: Superpixel Transformers

Installation

Loading models

More Examples

Notes:

RViT and On-Line Voronoi Tesselation

Progress and Current Todo's:

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

A Spitting Image: Modular Superpixel Tokenization in Vision Transformers

SPiT: Superpixel Transformers

Installation

Loading models

More Examples

Notes:

RViT and On-Line Voronoi Tesselation

Progress and Current Todo's:

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages