computer_vision_paper

paper_list

Image Recognition (CNN)
- image classification
- object detection
- semantic segmentation
Vision Transformer
Self-supervised learning
Weakly-supervised learning
Depth Estimation
Vision Language Model
medical AI
- classification
- segmentation

Image Recognition (CNN)

image classification

Name	year	paper	summary	code
AlexNet (ImageNet Classification with Deep Convolutional Neural Networks)	NeurPS 2012	paper	notion	code
VGGNet (Very Deep Convolutional Networks For Large-Scale Image Recognition)	ICLR 2015	paper	notion	code
ResNet (Deep Residual Learning for Image Recognition)	CVPR 2015	paper	notion	code
SENet (Squeeze-and-Excitation Networks)	CVPR 2018	paper	notion

object detection

Name	year	paper	summary
R-CNN (Rich feature hierarchies for accurate object detection and semantiv segmentation)	ILSVRC 2013	paper	notion
Fast R-CNN	2015	paper	notion
Faster R-CNN (Towards Real-Time Object Detection with Region Proposal Networks)	NIPS 2015	paper	notion
YOLO (You Only Look Once: Unified, Real-Time Object Detection)	2016	paper	notion
SSD (Single Shot MultiBox Detector)	2016	paper	notion

semantic segmentation

Name	year	paper	summary	code
FCN (Fully Convolutional Networks for Semantic Segmentation)	CVPR 2015	paper	tistory
U-Net (Convolutional Networks for Biomedical Image Segmentation)	MICCAI 2015	paper	tistory	code
SegNet (A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation)	2015	paper	tistory
DeepLab v1 (Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs)	ICLR 2015	paper
DeepLab v2 (DeepLab: Semantic Image Segmentation with Deep Convolutioanl Nets, Atrous Convolution, and Fully Connected CRFs)	TPAMI 2017	paper
DeepLab v3 (Rethinking Atrous Convolution for Semantic Image Segmentation)	2018	paper
DeepLab v3+ (Encoder-Decoder with Atrous Separable Convolution for Semantic Image-Segmentation)	ECCV 2018	paper
PSPNet (Pyramid Scene Parsing Network)	CVPR 2017	paper

Vision Transformer (ViT)

Name	year	paper	summary
ViT (An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale)	ICLR 2021	paper	tistory
Swin Transformer (Hierarchical Vision Transformer using Shifted Windows)	ICCV 2021	paper
MLP-Mixer (An all-MLP Architecture for Vision)	2021	paper
MaxViT (MaxViT: Multi-Axis Vision Transformer)	2022	paper

Self-supervised learning

Name	year	paper	summary	code
Context Prediction (Unsupervised Visual Representation Learning by Context Prediction)	ICCV 2015	paper
JigSaw (Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles)	2016	paper
Colorizations (Colorful Image Colorization)	ECCV 2016	paper
Rotations (Unsupervised Representation Learning By Predicting Image Rotations)	ICLR 2018	paper
SimCLR (A Single Framework for Contrastive Learning of Visual Representations)	ICML 2020	paper	tistory
MoCo (Momentum Contrast for Unsupervised Visual Representation Learning)	CVPR 2020	paper	tistory
BYOL (Bootstrap Your Own Latent A New Approach to Self-Supervised Learning)	NeurIPS 2020	paper	tistory
DINO (Emerging Properties in Self-supervised Vision Transformers)	2021	paper	tistory
SimCLR v2 (Big Self-Supervised Models are Strong Semi-Supervised Learners)	NeurIPS 2020	paper	tistory
MoCo v2 (Improved Baselines with Momentum Contrastive Learning)	2020	paper	tistory
MoCo v3 (An Empirical Study of Training Self-Supervised Vision Transformers)	ICCV 2021	paper	tistory
SimSiam (Exploring Simple Siamese Representation Learning)	CVPR 2021	paper	tistory	official
MAE (Masked Autoencoders Are Scalable Vision Learners)	CVPR 2022	paper	tistory	official
SimMIM (SimMIM : a Simple Framework for Masked Image Modeling)	CVPR 2022	paper	tistory
What Do Self-supervised Vision Transformers Learn?	ICLR 2023	paper	tistory

Weakly-supervised learning

Name	year	paper	summary
CAM (Learning Deep Features for Discriminative Localization)	CVPR 2016	paper	tistory
DSRG (Weakly-Supervised Semantic Segmentation Network with Deep Seeded Region Growing)	CVPR 2018	paper	tistory
SEAM (Self-supervised Equivariant Attention Mechanism for Weakly Supervised Semantic Segmentation)	CVPR 2020	paper	tistory
Learning pseudo labels for semi-and-weakly supervised semantic segmentation	2022	paper	tistory

Depth Estimation

Name	year	paper	summary
Depth Map Prediction from a Single Image using a Multi-Scale Deep Network	NeurIPS 2014	paper	tistory
Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture	ICCV 2015	paper	tistory
Deeper Depth Prediction with Fully Convolutional Residual Networks	3DV 2016	paper	tistory
Single-Image Depth Perception in the Wild	NeurIPS 2016	paper	tistory
Deep Ordinal Regression Network for Monocular Depth Estimation	CVPR 2018	paper
Joint Task-Recursive Learning for Semantic Segmentation and Depth Estimation	ECCV 2018	paper
Unsupervised Learning of Depth and Ego-Motion from Video	CVPR 2017	paper
Unsupervised Monocular Depth Estimation with Left-Right Consistency	CVPR 2017	paper
Digging Into Self-Supervised Monocular Depth Estimation	ICCV 2019	paper

Vision Language Model (VLM)

Name	year	paper	summary
CLIP (Learning Transferable Visual Models From Natural Language Supervision)	2021	paper	tistory
CoOp (Conditional Prompt Learning for Vision-Language Models)	CVPR 2022	paper
Flamingo (Flamingo: a Visual Language Model for Few-Shot Learning)	DeepMind 2022	paper

Medical AI

classification

Name	year	paper	summary	code
MICLe (Big Self-Supervised Models Advance Medical Image Classifications)	ICCV 2021	paper	tistory

segmentation

Name	year	paper	summary	code
U-Net (Convolutional Networks for Biomedical Image Segmentation)	MICCAI 2015	paper	tistory	code
TransUNet (Transformers Make Strong Encoders for Medical Image Segmentation)	2021	paper	tistory
UNETR (UNETR: Transformers for 3D Medical Image Segmentation)	2021	paper	tistory
Swin-Unet (Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation)	2021	paper
TransBTS (Multimodal Brain Tumor Segmentation Using Transformer)	2021	paper
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis	CVPR 2022	paper

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

computer_vision_paper

paper_list

Image Recognition (CNN)

image classification

object detection

semantic segmentation

Vision Transformer (ViT)

Self-supervised learning

Weakly-supervised learning

Depth Estimation

Vision Language Model (VLM)

Medical AI

classification

segmentation

About

Uh oh!

Releases

Packages

gompaang/computer_vision_paper

Folders and files

Latest commit

History

Repository files navigation

computer_vision_paper

paper_list

Image Recognition (CNN)

image classification

object detection

semantic segmentation

Vision Transformer (ViT)

Self-supervised learning

Weakly-supervised learning

Depth Estimation

Vision Language Model (VLM)

Medical AI

classification

segmentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages