Official PyTorch implementation for our paper HyenaPixel: Global Image Context with Convolutions (ECAI 2024).
In computer vision, a larger effective receptive field (ERF) is associated with improved performance. While attention mechanisms in vision transformers (ViTs) natively support global context, their quadratic complexity in the number of pixels limits applicability for high-resolution images. HyenaPixel extends Hyena, originally developed for causal 1D sequences in natural language processing (NLP), to:
- 2D image space, introducing HyenaPixel, which uses extremely large convolutional kernels of up to 191×191 to maximize the ERF while maintaining sub-quadratic complexity.
- bidirectional, non-causal modeling via Bidirectional Hyena.
In the MetaFormer framework, HyenaPixel and Bidirectional Hyena reach ImageNet-1k image classification top-1 accuracies of 84.9 % and 85.2 % respectively, outperforming many large-kernel convolutional networks (CNNs).
| Model | Resolution | Params | ImageNet-1k Top-1 Accuracy | Download |
|---|---|---|---|---|
| hpx_former_s18 | 224 | 29M | 83.2 | HuggingFace |
| hpx_former_s18_384 | 384 | 29M | 84.7 | HuggingFace |
| hb_former_s18 | 224 | 28M | 83.5 | HuggingFace |
| c_hpx_former_s18 | 224 | 28M | 83.0 | HuggingFace |
| hpx_a_former_s18 | 224 | 28M | 83.6 | HuggingFace |
| hb_a_former_s18 | 224 | 27M | 83.2 | HuggingFace |
| hpx_former_b36 | 224 | 111M | 84.9 | HuggingFace |
| hb_former_b36 | 224 | 102M | 85.2 | HuggingFace |
Create a conda environment and install the requirements.
conda create -n hyenapixel python=3.10
conda activate hyenapixel
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install -e .Prepare ImageNet-1k with this script.
We trained our models with 8 Nvidia A100 GPUs with the SLURM scripts located in ./scripts/.
Adjust the SLURM parameters NUM_GPU and GRAD_ACCUM_STEPS to match your system.
For object detection (COCO) and segmentation (ADE20k) view the detection and segmentation folders.
Run the following command to validate the hpx_former_s18.
Replace data/imagenet with the path to ImageNet-1k and hpx_former_s18 wtih the model you intend to validate.
python validate.py data/imagenet --model hpx_former_s18Our implementation is based on HazyResearch/safari, rwightman/pytorch-image-models and sail-sg/metaformer. This research has been funded by the Federal Ministry of Education and Research of Germany under grant no. 01IS22094C WEST-AI.
@inproceedings{spravil2024hyenapixel,
author = {Julian Spravil and Sebastian Houben and Sven Behnke},
title = {HyenaPixel: Global Image Context with Convolutions},
booktitle = {ECAI},
pages = {521--528},
year = {2024},
}