HyenaPixel: Global Image Context with Convolutions

Official PyTorch implementation for our paper HyenaPixel: Global Image Context with Convolutions (ECAI 2024).

In computer vision, a larger effective receptive field (ERF) is associated with improved performance. While attention mechanisms in vision transformers (ViTs) natively support global context, their quadratic complexity in the number of pixels limits applicability for high-resolution images. HyenaPixel extends Hyena, originally developed for causal 1D sequences in natural language processing (NLP), to:

2D image space, introducing HyenaPixel, which uses extremely large convolutional kernels of up to 191×191 to maximize the ERF while maintaining sub-quadratic complexity.
bidirectional, non-causal modeling via Bidirectional Hyena.

In the MetaFormer framework, HyenaPixel and Bidirectional Hyena reach ImageNet-1k image classification top-1 accuracies of 84.9 % and 85.2 % respectively, outperforming many large-kernel convolutional networks (CNNs).

Models

Model	Resolution	Params	ImageNet-1k Top-1 Accuracy	Download
hpx_former_s18	224	29M	83.2	HuggingFace
hpx_former_s18_384	384	29M	84.7	HuggingFace
hb_former_s18	224	28M	83.5	HuggingFace
c_hpx_former_s18	224	28M	83.0	HuggingFace
hpx_a_former_s18	224	28M	83.6	HuggingFace
hb_a_former_s18	224	27M	83.2	HuggingFace
hpx_former_b36	224	111M	84.9	HuggingFace
hb_former_b36	224	102M	85.2	HuggingFace

Usage

Setup

Create a conda environment and install the requirements.

conda create -n hyenapixel python=3.10
conda activate hyenapixel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -e .

Dataset

Prepare ImageNet-1k with this script.

Training

We trained our models with 8 Nvidia A100 GPUs with the SLURM scripts located in ./scripts/. Adjust the SLURM parameters NUM_GPU and GRAD_ACCUM_STEPS to match your system.

For object detection (COCO) and segmentation (ADE20k) view the detection and segmentation folders.

Validation

Run the following command to validate the hpx_former_s18. Replace data/imagenet with the path to ImageNet-1k and hpx_former_s18 wtih the model you intend to validate.

python validate.py data/imagenet --model hpx_former_s18

Acknowledgments

Our implementation is based on HazyResearch/safari, rwightman/pytorch-image-models and sail-sg/metaformer. This research has been funded by the Federal Ministry of Education and Research of Germany under grant no. 01IS22094C WEST-AI.

Bibtex

@inproceedings{spravil2024hyenapixel,
  author    = {Julian Spravil and Sebastian Houben and Sven Behnke},
  title     = {HyenaPixel: Global Image Context with Convolutions},
  booktitle = {ECAI},
  pages     = {521--528},
  year      = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
detection		detection
docs		docs
scripts		scripts
segmentation		segmentation
src/hyenapixel		src/hyenapixel
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
macs.py		macs.py
pyproject.toml		pyproject.toml
train.py		train.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HyenaPixel: Global Image Context with Convolutions

Models

Usage

Setup

Dataset

Training

Validation

Acknowledgments

Bibtex

About

Uh oh!

Languages

License

spravil/HyenaPixel

Folders and files

Latest commit

History

Repository files navigation

HyenaPixel: Global Image Context with Convolutions

Models

Usage

Setup

Dataset

Training

Validation

Acknowledgments

Bibtex

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages