A library of self-supervised methods for unsupervised visual representation learning powered by PyTorch Lightning. We aim at providing SOTA self-supervised methods in a comparable environment while, at the same time, implementing training tricks. While the library is self-contained, it is possible to use the models outside of solo-learn.
- [Dec 01 2021]: β² Added support for getting mid-level features and PoolFormer.
- [Nov 29 2021]:
βΌοΈ Breaking changes! Update your versions!!! - [Nov 29 2021]: π New tutorials!
- [Nov 29 2021]: ποΈ Added offline K-NN and offline UMAP.
- [Nov 29 2021]: π¨ Updated PyTorch and PyTorch Lightning versions. 10% faster.
- [Nov 29 2021]: π» Added code of conduct, contribution instructions, issue templates and UMAP tutorial.
- [Nov 23 2021]: πΎ Added VIbCReg.
- [Oct 21 2021]: π€ Added support for object recognition via Detectron v2 and auto resume functionally that automatically tries to resume an experiment that crashed/reached a timeout.
- [Oct 10 2021]: πΉ Restructured augmentation pipelines to allow more flexibility and multicrop. Also added multicrop for BYOL.
- [Sep 27 2021]: π Added NNSiam, NNBYOL, new tutorials for implementing new methods 1 and 2, more testing and fixed issues with custom data and linear evaluation.
- [Sep 19 2021]: π¦ Added online k-NN evaluation.
- [Sep 17 2021]: π€ Added ViT and Swin.
- [Sep 13 2021]: π Improved Docs and added tutorials for pretraining and offline linear eval.
- [Aug 13 2021]: π³ DeepCluster V2 is now available.
- Barlow Twins
- BYOL
- DeepCluster V2
- DINO
- MoCo V2+
- NNBYOL
- NNCLR
- NNSiam
- ReSSL
- SimCLR + Supervised Contrastive Learning
- SimSiam
- Swav
- VIbCReg
- VICReg
- W-MSE
- Increased data processing speed by up to 100% using Nvidia Dali.
- Flexible augmentations.
- Online linear evaluation via stop-gradient for easier debugging and prototyping (optionally available for the momentum backbone as well).
- Online and offlfine K-NN evaluation.
- Normal offline linear evaluation.
- All the perks of PyTorch Lightning (mixed precision, gradient accumulation, clipping, automatic logging and much more).
- Easy-to-extend modular code structure.
- Custom model logging with a simpler file organization.
- Automatic feature space visualization with UMAP.
- Offline UMAP.
- Common metrics and more to come...
- Multi-cropping dataloading following SwAV:
- Note: currently, only SimCLR supports this.
- Exclude batchnorm and biases from LARS.
- No LR scheduler for the projection head in SimSiam.
- torch
- torchvision
- tqdm
- einops
- wandb
- pytorch-lightning
- lightning-bolts
- torchmetrics
- scipy
- timm
Optional:
- nvidia-dali
- matplotlib
- seaborn
- pandas
- umap-learn
First clone the repo.
Then, to install solo-learn with Dali and/or UMAP support, use:
pip3 install .[dali,umap]
If no Dali/UMAP support is needed, the repository can be installed as:
pip3 install .
NOTE: if you are having trouble with dali, install it with pip install --extra-index-url https://developer.download.nvidia.com/compute/redist --upgrade nvidia-dali-cuda110
or with your specific cuda version.
NOTE 2: If you want to modify the library, install it in dev mode with -e
.
NOTE 3: Soon to be on pip.
For pretraining the backbone, follow one of the many bash files in bash_files/pretrain/
.
After that, for offline linear evaluation, follow the examples on bash_files/linear
.
NOTE: Files try to be up-to-date and follow as closely as possible the recommended parameters of each paper, but check them before running.
Please, check out our documentation and tutorials:
- Overview
- Offline linear eval
- Object detection
- Adding a new method
- Adding a new momentum method
- Visualizing features with UMAP
- Offline k-NN
If you want to contribute to solo-learn, make sure you take a look at how to contribute and follow the code of conduct
All pretrained models avaiable can be downloaded directly via the tables below or programmatically by running one of the following scripts
zoo/cifar10.sh
, zoo/cifar100.sh
, zoo/imagenet100.sh
and zoo/imagenet.sh
.
Note: hyperparameters may not be the best, we will be re-running the methods with lower performance eventually.
Method | Backbone | Epochs | Dali | Acc@1 (online) | Acc@1 (offline) | Acc@5 (online) | Acc@5 (offline) | Checkpoint |
---|---|---|---|---|---|---|---|---|
Barlow Twins | ResNet18 | 1000 | β | 92.10 | 99.73 | π | ||
BYOL | ResNet18 | 1000 | β | 92.58 | 99.79 | π | ||
DeepCluster V2 | ResNet18 | 1000 | β | 88.85 | 99.58 | π | ||
DINO | ResNet18 | 1000 | β | 89.52 | 99.71 | π | ||
MoCo V2+ | ResNet18 | 1000 | β | 92.94 | 99.79 | π | ||
NNCLR | ResNet18 | 1000 | β | 91.88 | 99.78 | π | ||
ReSSL | ResNet18 | 1000 | β | 90.63 | 99.62 | π | ||
SimCLR | ResNet18 | 1000 | β | 90.74 | 99.75 | π | ||
Simsiam | ResNet18 | 1000 | β | 90.51 | 99.72 | π | ||
SwAV | ResNet18 | 1000 | β | 89.17 | 99.68 | π | ||
VIbCReg | ResNet18 | 1000 | β | 91.18 | 99.74 | π | ||
VICReg | ResNet18 | 1000 | β | 92.07 | 99.74 | π | ||
W-MSE | ResNet18 | 1000 | β | 88.67 | 99.68 | π |
Method | Backbone | Epochs | Dali | Acc@1 (online) | Acc@1 (offline) | Acc@5 (online) | Acc@5 (offline) | Checkpoint |
---|---|---|---|---|---|---|---|---|
Barlow Twins | ResNet18 | 1000 | β | 70.90 | 91.91 | π | ||
BYOL | ResNet18 | 1000 | β | 70.46 | 91.96 | π | ||
DeepCluster V2 | ResNet18 | 1000 | β | 63.61 | 88.09 | π | ||
DINO | ResNet18 | 1000 | β | 66.76 | 90.34 | π | ||
MoCo V2+ | ResNet18 | 1000 | β | 69.89 | 91.65 | π | ||
NNCLR | ResNet18 | 1000 | β | 69.62 | 91.52 | π | ||
ReSSL | ResNet18 | 1000 | β | 65.92 | 89.73 | π | ||
SimCLR | ResNet18 | 1000 | β | 65.78 | 89.04 | π | ||
Simsiam | ResNet18 | 1000 | β | 66.04 | 89.62 | π | ||
SwAV | ResNet18 | 1000 | β | 64.88 | 88.78 | π | ||
VIbCReg | ResNet18 | 1000 | β | 67.37 | 90.07 | π | ||
VICReg | ResNet18 | 1000 | β | 68.54 | 90.83 | π | ||
W-MSE | ResNet18 | 1000 | β | 61.33 | 87.26 | π |
Method | Backbone | Epochs | Dali | Acc@1 (online) | Acc@1 (offline) | Acc@5 (online) | Acc@5 (offline) | Checkpoint |
---|---|---|---|---|---|---|---|---|
Barlow Twins π | ResNet18 | 400 | βοΈ | 80.38 | 80.16 | 95.28 | 95.14 | π |
BYOL π | ResNet18 | 400 | βοΈ | 80.16 | 80.32 | 95.02 | 94.94 | π |
DeepCluster V2 | ResNet18 | 400 | β | 75.36 | 75.4 | 93.22 | 93.10 | π |
DINO | ResNet18 | 400 | βοΈ | 74.84 | 74.92 | 92.92 | 92.78 | π |
DINO πͺ | ViT Tiny | 400 | β | 63.04 | TODO | 87.72 | TODO | π |
MoCo V2+ π | ResNet18 | 400 | βοΈ | 78.20 | 79.28 | 95.50 | 95.18 | π |
NNCLR π | ResNet18 | 400 | βοΈ | 79.80 | 80.16 | 95.28 | 95.30 | π |
ReSSL | ResNet18 | 400 | βοΈ | 76.92 | 78.48 | 94.20 | 94.24 | π |
SimCLR π | ResNet18 | 400 | βοΈ | 77.04 | 77.48 | 94.02 | 93.42 | π |
Simsiam | ResNet18 | 400 | βοΈ | 74.54 | 78.72 | 93.16 | 94.78 | π |
SwAV | ResNet18 | 400 | βοΈ | 74.04 | 74.28 | 92.70 | 92.84 | π |
VIbCReg | ResNet18 | 400 | βοΈ | 79.86 | 79.38 | 94.98 | 94.60 | π |
VICReg π | ResNet18 | 400 | βοΈ | 79.22 | 79.40 | 95.06 | 95.02 | π |
W-MSE | ResNet18 | 400 | βοΈ | 67.60 | 69.06 | 90.94 | 91.22 | π |
π methods where hyperparameters were heavily tuned.
πͺ ViT is very compute intensive and unstable, so we are slowly running larger architectures and with a larger batch size. Atm, total batch size is 128 and we needed to use float32 precision. If you want to contribute by running it, let us know!
Method | Backbone | Epochs | Dali | Acc@1 (online) | Acc@1 (offline) | Acc@5 (online) | Acc@5 (offline) | Checkpoint |
---|---|---|---|---|---|---|---|---|
Barlow Twins | ResNet50 | 100 | βοΈ | |||||
BYOL | ResNet50 | 100 | βοΈ | 68.63 | 68.37 | 88.80 | 88.66 | π |
DeepCluster V2 | ResNet50 | 100 | βοΈ | |||||
DINO | ResNet50 | 100 | βοΈ | |||||
MoCo V2+ | ResNet50 | 100 | βοΈ | |||||
NNCLR | ResNet50 | 100 | βοΈ | |||||
ReSSL | ResNet50 | 100 | βοΈ | |||||
SimCLR | ResNet50 | 100 | βοΈ | |||||
Simsiam | ResNet50 | 100 | βοΈ | |||||
SwAV | ResNet50 | 100 | βοΈ | |||||
VIbCReg | ResNet50 | 100 | βοΈ | |||||
VICReg | ResNet50 | 100 | βοΈ | |||||
W-MSE | ResNet50 | 100 | βοΈ |
We report the training efficiency of some methods using a ResNet18 with and without DALI (4 workers per GPU) in a server with an Intel i9-9820X and two RTX2080ti.
Method | Dali | Total time for 20 epochs | Time for a 1 epoch | GPU memory (per GPU) |
---|---|---|---|---|
Barlow Twins | β | 1h 38m 27s | 4m 55s | 5097 MB |
βοΈ | 43m 2s | 2m 10s (56% faster) | 9292 MB | |
BYOL | β | 1h 38m 46s | 4m 56s | 5409 MB |
βοΈ | 50m 33s | 2m 31s (49% faster) | 9521 MB | |
NNCLR | β | 1h 38m 30s | 4m 55s | 5060 MB |
βοΈ | 42m 3s | 2m 6s (64% faster) | 9244 MB |
Note: GPU memory increase doesn't scale with the model, rather it scales with the number of workers.
If you use solo-learn, please cite our preprint:
@misc{turrisi2021sololearn,
title={Solo-learn: A Library of Self-supervised Methods for Visual Representation Learning},
author={Victor G. Turrisi da Costa and Enrico Fini and Moin Nabi and Nicu Sebe and Elisa Ricci},
year={2021},
eprint={2108.01775},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={\url{https://github.com/vturrisi/solo-learn}},
}