Skip to content

spravil/HyenaPixel

Repository files navigation

arXiv

IOPRess

HuggingFace

Official PyTorch implementation for our paper HyenaPixel: Global Image Context with Convolutions (ECAI 2024).

In computer vision, a larger effective receptive field (ERF) is associated with improved performance. While attention mechanisms in vision transformers (ViTs) natively support global context, their quadratic complexity in the number of pixels limits applicability for high-resolution images. HyenaPixel extends Hyena, originally developed for causal 1D sequences in natural language processing (NLP), to:

  • 2D image space, introducing HyenaPixel, which uses extremely large convolutional kernels of up to 191×191 to maximize the ERF while maintaining sub-quadratic complexity.
  • bidirectional, non-causal modeling via Bidirectional Hyena.

In the MetaFormer framework, HyenaPixel and Bidirectional Hyena reach ImageNet-1k image classification top-1 accuracies of 84.9 % and 85.2 % respectively, outperforming many large-kernel convolutional networks (CNNs).

Visualization of HyenaPixel


Models

Model Resolution Params ImageNet-1k Top-1 Accuracy Download
hpx_former_s18 224 29M 83.2 HuggingFace
hpx_former_s18_384 384 29M 84.7 HuggingFace
hb_former_s18 224 28M 83.5 HuggingFace
c_hpx_former_s18 224 28M 83.0 HuggingFace
hpx_a_former_s18 224 28M 83.6 HuggingFace
hb_a_former_s18 224 27M 83.2 HuggingFace
hpx_former_b36 224 111M 84.9 HuggingFace
hb_former_b36 224 102M 85.2 HuggingFace

Usage

Setup

Create a conda environment and install the requirements.

conda create -n hyenapixel python=3.10
conda activate hyenapixel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -e .

Dataset

Prepare ImageNet-1k with this script.

Training

We trained our models with 8 Nvidia A100 GPUs with the SLURM scripts located in ./scripts/. Adjust the SLURM parameters NUM_GPU and GRAD_ACCUM_STEPS to match your system.

For object detection (COCO) and segmentation (ADE20k) view the detection and segmentation folders.

Validation

Run the following command to validate the hpx_former_s18. Replace data/imagenet with the path to ImageNet-1k and hpx_former_s18 wtih the model you intend to validate.

python validate.py data/imagenet --model hpx_former_s18

Acknowledgments

Our implementation is based on HazyResearch/safari, rwightman/pytorch-image-models and sail-sg/metaformer. This research has been funded by the Federal Ministry of Education and Research of Germany under grant no. 01IS22094C WEST-AI.

Bibtex

@inproceedings{spravil2024hyenapixel,
  author    = {Julian Spravil and Sebastian Houben and Sven Behnke},
  title     = {HyenaPixel: Global Image Context with Convolutions},
  booktitle = {ECAI},
  pages     = {521--528},
  year      = {2024},
}