Skip to content

EfficientViT is a new family of vision models for efficient high-resolution vision.

License

Notifications You must be signed in to change notification settings

figurekim317/efficientvit

 
 

Repository files navigation

EfficientViT

About EfficientViT Models

EfficientViT is a new family of vision models for efficient high-resolution vision, especially segmentation. The core building block of EfficientViT is a new lightweight multi-scale attention module that achieves global receptive field and multi-scale learning with only hardware-efficient operations.

Here are comparisons with prior SOTA semantic segmentation models:

Here are the results of EfficientViT on image classification:

Getting Started

Installation

conda create -n efficientvit python=3.8.5
conda activate efficientvit
conda install pytorch=1.13.1 torchvision=0.14.1 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install tqdm opencv-python

Dataset

Download Pretrained Models

Mobile latency is measured on Qualcomm Snapdragon 8Gen1 with Tensorflow-Lite, fp32, batch size 1.

ImageNet

Model Resolution ImageNet Top1 Acc ImageNet Top5 Acc Params MACs Mobile Latency Checkpoint
EfficientViT-B1 224 79.4 94.3 9.1M 0.52G 19ms link
EfficientViT-B1 256 79.9 94.7 9.1M 0.68G 24ms link
EfficientViT-B1 288 80.4 95.0 9.1M 0.86G 31ms link
EfficientViT-B2 224 82.1 95.8 24M 1.6G 55ms link
EfficientViT-B2 256 82.7 96.1 24M 2.1G 72ms link
EfficientViT-B2 288 83.1 96.3 24M 2.6G 92ms link
EfficientViT-B3 224 83.5 96.4 49M 4.0G 140ms link
EfficientViT-B3 256 83.8 96.5 49M 5.2G 180ms link
EfficientViT-B3 288 84.2 96.7 49M 6.5G 228ms link

Cityscapes

Model Resolution Cityscapes mIoU Params MACs Mobile Latency Checkpoint
EfficientViT-B0 960x1920 75.5 0.7M 3.9G 0.20s link
EfficientViT-B1 896x1792 80.1 4.8M 19G 0.82s link
EfficientViT-B2 1024x2048 82.1 15M 74G 3.1s link
EfficientViT-B3 1184x2368 83.2 40M 240G 10s link

ADE20K

Model Resolution ADE20K mIoU Params MACs Mobile Latency Checkpoint
EfficientViT-B1 480 42.7 4.8M 2.7G 0.10s link
EfficientViT-B2 416 45.1 15M 6.0G 0.21s link
EfficientViT-B3 512 49.0 39M 22G 0.8s link

Usage

from models.cls_model_zoo import create_cls_model

model = create_cls_model(
  name="b3", 
  pretrained=True, 
  weight_url="assets/checkpoints/cls/b3-r288.pt"
)
from models.seg_model_zoo import create_seg_model

model = create_seg_model(
  name="b3", 
  dataset="cityscapes", 
  pretrained=True, 
  weight_url="assets/checkpoints/seg/cityscapes/b3-r1184.pt"
)
from models.seg_model_zoo import create_seg_model

model = create_seg_model(
  name="b3", 
  dataset="ade20k", 
  pretrained=True, 
  weight_url="assets/checkpoints/seg/ade20k/b3-r512.pt"
)

Evaluation

Please run eval_cls_model.py or eval_seg_model.py to evaluate our models.

Examples: classification, segmentation

Visualization

Please run eval_seg_model.py to visualize the outputs of our semantic segmentation models.

Example:

python eval_seg_model.py --dataset cityscapes --crop_size 1184 --model b3-r1184 --save_path demo/cityscapes/b3-r1184/

Contact

Han Cai: hancai@mit.edu

Citation

If EfficientViT is useful or relevant to your research, please kindly recognize our contributions by citing our paper:

@article{cai2022efficientvit,
  title={Efficientvit: Enhanced linear attention for high-resolution low-computation visual recognition},
  author={Cai, Han and Gan, Chuang and Han, Song},
  journal={arXiv preprint arXiv:2205.14756},
  year={2022}
}

About

EfficientViT is a new family of vision models for efficient high-resolution vision.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.1%
  • Shell 2.9%