Skip to content

solo-learn: a library of self-supervised methods for visual representation learning powered by Pytorch Lightning

License

Notifications You must be signed in to change notification settings

vturrisi/solo-learn

Repository files navigation

tests Documentation Status codecov

solo-learn

A library of self-supervised methods for unsupervised visual representation learning powered by PyTorch Lightning. We aim at providing SOTA self-supervised methods in a comparable environment while, at the same time, implementing training tricks. The library is self-contained, but it is possible to use the models outside of solo-learn. More details in our paper.


News

  • [Dec 31 2022]: 🌠 Shiny new logo!
  • [Sep 27 2022]: πŸ“ Brand new config system using OmegaConf/Hydra. Adds more clarity and flexibility. New tutorials will follow soon!
  • [Aug 04 2022]: πŸ–ŒοΈ Added MAE and supports finetuning the backbone with main_linear.py, mixup, cutmix and random augment.
  • [Jul 13 2022]: πŸ’– Added support for H5 data, improved scripts and data handling.
  • [Jun 26 2022]: πŸ”₯ Added MoCo V3.
  • [Jun 10 2022]: πŸ’£ Improved LARS.
  • [Jun 09 2022]: 🍭 Added support for WideResnet, multicrop for SwAV and equalization data augmentation.
  • [May 02 2022]: πŸ’  Wrapped Dali with a DataModule, added auto resume for linear eval and Wandb run resume.
  • [Apr 12 2022]: 🌈 Improved design of models and added support to train with a fraction of data.
  • [Apr 01 2022]: πŸ” Added the option to use channel last conversion which considerably decreases training times.
  • [Feb 04 2022]: πŸ₯³ Paper got accepted to JMLR.
  • [Jan 31 2022]: πŸ‘οΈ Added ConvNeXt support with timm.
  • [Dec 20 2021]: 🌑️ Added ImageNet results, scripts and checkpoints for MoCo V2+.
  • [Dec 05 2021]: 🎢 Separated SupCon from SimCLR and added runs.
  • [Dec 01 2021]: β›² Added PoolFormer.
  • [Nov 29 2021]: ‼️ Breaking changes! Update your versions!!!
  • [Nov 29 2021]: πŸ“– New tutorials!
  • [Nov 29 2021]: 🏘️ Added offline K-NN and offline UMAP.
  • [Nov 29 2021]: 🚨 Updated PyTorch and PyTorch Lightning versions. 10% faster.
  • [Nov 29 2021]: 🍻 Added code of conduct, contribution instructions, issue templates and UMAP tutorial.
  • [Nov 23 2021]: πŸ‘Ύ Added VIbCReg.
  • [Oct 21 2021]: 😀 Added support for object recognition via Detectron v2 and auto resume functionally that automatically tries to resume an experiment that crashed/reached a timeout.
  • [Oct 10 2021]: πŸ‘Ή Restructured augmentation pipelines to allow more flexibility and multicrop. Also added multicrop for BYOL.
  • [Sep 27 2021]: πŸ• Added NNSiam, NNBYOL, new tutorials for implementing new methods 1 and 2, more testing and fixed issues with custom data and linear evaluation.
  • [Sep 19 2021]: 🦘 Added online k-NN evaluation.
  • [Sep 17 2021]: πŸ€– Added ViT and Swin.
  • [Sep 13 2021]: πŸ“– Improved Docs and added tutorials for pretraining and offline linear eval.
  • [Aug 13 2021]: 🐳 DeepCluster V2 is now available.

Roadmap and help needed

  • Redoing the documentation to improve clarity.
  • Better and up-to-date tutorials.
  • Add performance-related testing to ensure that methods perform the same across updates.
  • Adding new methods (continuous effort).

Methods available


Extra flavor

Backbones

Data

  • Increased data processing speed by up to 100% using Nvidia Dali.
  • Flexible augmentations.

Evaluation

  • Online linear evaluation via stop-gradient for easier debugging and prototyping (optionally available for the momentum backbone as well).
  • Standard offline linear evaluation.
  • Online and offline K-NN evaluation.
  • Automatic feature space visualization with UMAP.

Training tricks

  • All the perks of PyTorch Lightning (mixed precision, gradient accumulation, clipping, and much more).
  • Channel last conversion
  • Multi-cropping dataloading following SwAV:
    • Note: currently, only SimCLR, BYOL and SwAV support this.
  • Exclude batchnorm and biases from weight decay and LARS.
  • No LR scheduler for the projection head (as in SimSiam).

Logging

  • Metric logging on the cloud with WandB
  • Custom model checkpointing with a simple file organization.

Requirements

  • torch
  • torchvision
  • tqdm
  • einops
  • wandb
  • pytorch-lightning
  • lightning-bolts
  • torchmetrics
  • scipy
  • timm

Optional:

  • nvidia-dali
  • matplotlib
  • seaborn
  • pandas
  • umap-learn

Installation

First clone the repo.

Then, to install solo-learn with Dali and/or UMAP support, use:

pip3 install .[dali,umap,h5] --extra-index-url https://developer.download.nvidia.com/compute/redist

If no Dali/UMAP/H5 support is needed, the repository can be installed as:

pip3 install .

For local development:

pip3 install -e .[umap,h5]
# Make sure you have pre-commit hooks installed
pre-commit install

NOTE: if you are having trouble with dali, install it following their guide.

NOTE 2: consider installing Pillow-SIMD for better loading times when not using Dali.

NOTE 3: Soon to be on pip.


Training

For pretraining the backbone, follow one of the many bash files in scripts/pretrain/. We are now using Hydra to handle the config files, so the common syntax is something like:

python3 main_pretrain.py \
    # path to training script folder
    --config-path scripts/pretrain/imagenet-100/ \
    # training config name
    --config-name barlow.yaml
    # add new arguments (e.g. those not defined in the yaml files)
    # by doing ++new_argument=VALUE
    # pytorch lightning's arguments can be added here as well.

After that, for offline linear evaluation, follow the examples in scripts/linear or scripts/finetune for finetuning the whole backbone.

For k-NN evaluation and UMAP visualization check the scripts in scripts/{knn,umap}.

NOTE: Files try to be up-to-date and follow as closely as possible the recommended parameters of each paper, but check them before running.


Tutorials

Please, check out our documentation and tutorials:

If you want to contribute to solo-learn, make sure you take a look at how to contribute and follow the code of conduct


Model Zoo

All pretrained models avaiable can be downloaded directly via the tables below or programmatically by running one of the following scripts zoo/cifar10.sh, zoo/cifar100.sh, zoo/imagenet100.sh and zoo/imagenet.sh.


Results

Note: hyperparameters may not be the best, we will be re-running the methods with lower performance eventually.

CIFAR-10

Method Backbone Epochs Dali Acc@1 Acc@5 Checkpoint
Barlow Twins ResNet18 1000 ❌ 92.10 99.73 πŸ”—
BYOL ResNet18 1000 ❌ 92.58 99.79 πŸ”—
DeepCluster V2 ResNet18 1000 ❌ 88.85 99.58 πŸ”—
DINO ResNet18 1000 ❌ 89.52 99.71 πŸ”—
MoCo V2+ ResNet18 1000 ❌ 92.94 99.79 πŸ”—
MoCo V3 ResNet18 1000 ❌ 93.10 99.80 πŸ”—
NNCLR ResNet18 1000 ❌ 91.88 99.78 πŸ”—
ReSSL ResNet18 1000 ❌ 90.63 99.62 πŸ”—
SimCLR ResNet18 1000 ❌ 90.74 99.75 πŸ”—
Simsiam ResNet18 1000 ❌ 90.51 99.72 πŸ”—
SupCon ResNet18 1000 ❌ 93.82 99.65 πŸ”—
SwAV ResNet18 1000 ❌ 89.17 99.68 πŸ”—
VIbCReg ResNet18 1000 ❌ 91.18 99.74 πŸ”—
VICReg ResNet18 1000 ❌ 92.07 99.74 πŸ”—
W-MSE ResNet18 1000 ❌ 88.67 99.68 πŸ”—

CIFAR-100

Method Backbone Epochs Dali Acc@1 Acc@5 Checkpoint
Barlow Twins ResNet18 1000 ❌ 70.90 91.91 πŸ”—
BYOL ResNet18 1000 ❌ 70.46 91.96 πŸ”—
DeepCluster V2 ResNet18 1000 ❌ 63.61 88.09 πŸ”—
DINO ResNet18 1000 ❌ 66.76 90.34 πŸ”—
MoCo V2+ ResNet18 1000 ❌ 69.89 91.65 πŸ”—
MoCo V3 ResNet18 1000 ❌ 68.83 90.57 πŸ”—
NNCLR ResNet18 1000 ❌ 69.62 91.52 πŸ”—
ReSSL ResNet18 1000 ❌ 65.92 89.73 πŸ”—
SimCLR ResNet18 1000 ❌ 65.78 89.04 πŸ”—
Simsiam ResNet18 1000 ❌ 66.04 89.62 πŸ”—
SupCon ResNet18 1000 ❌ 70.38 89.57 πŸ”—
SwAV ResNet18 1000 ❌ 64.88 88.78 πŸ”—
VIbCReg ResNet18 1000 ❌ 67.37 90.07 πŸ”—
VICReg ResNet18 1000 ❌ 68.54 90.83 πŸ”—
W-MSE ResNet18 1000 ❌ 61.33 87.26 πŸ”—

ImageNet-100

Method Backbone Epochs Dali Acc@1 (online) Acc@1 (offline) Acc@5 (online) Acc@5 (offline) Checkpoint
Barlow Twins πŸš€ ResNet18 400 βœ”οΈ 80.38 80.16 95.28 95.14 πŸ”—
BYOL πŸš€ ResNet18 400 βœ”οΈ 80.16 80.32 95.02 94.94 πŸ”—
DeepCluster V2 ResNet18 400 ❌ 75.36 75.4 93.22 93.10 πŸ”—
DINO ResNet18 400 βœ”οΈ 74.84 74.92 92.92 92.78 πŸ”—
DINO πŸ˜ͺ ViT Tiny 400 ❌ 63.04 TODO 87.72 TODO πŸ”—
MoCo V2+ πŸš€ ResNet18 400 βœ”οΈ 78.20 79.28 95.50 95.18 πŸ”—
MoCo V3 πŸš€ ResNet18 400 βœ”οΈ 80.36 80.36 95.18 94.96 πŸ”—
MoCo V3 πŸš€ ResNet50 400 βœ”οΈ 85.48 84.58 96.82 96.70 πŸ”—
NNCLR πŸš€ ResNet18 400 βœ”οΈ 79.80 80.16 95.28 95.30 πŸ”—
ReSSL ResNet18 400 βœ”οΈ 76.92 78.48 94.20 94.24 πŸ”—
SimCLR πŸš€ ResNet18 400 βœ”οΈ 77.64 TODO 94.06 TODO πŸ”—
Simsiam ResNet18 400 βœ”οΈ 74.54 78.72 93.16 94.78 πŸ”—
SupCon ResNet18 400 βœ”οΈ 84.40 TODO 95.72 TODO πŸ”—
SwAV ResNet18 400 βœ”οΈ 74.04 74.28 92.70 92.84 πŸ”—
VIbCReg ResNet18 400 βœ”οΈ 79.86 79.38 94.98 94.60 πŸ”—
VICReg πŸš€ ResNet18 400 βœ”οΈ 79.22 79.40 95.06 95.02 πŸ”—
W-MSE ResNet18 400 βœ”οΈ 67.60 69.06 90.94 91.22 πŸ”—

πŸš€ methods where hyperparameters were heavily tuned.

πŸ˜ͺ ViT is very compute intensive and unstable, so we are slowly running larger architectures and with a larger batch size. Atm, total batch size is 128 and we needed to use float32 precision. If you want to contribute by running it, let us know!

ImageNet

Method Backbone Epochs Dali Acc@1 (online) Acc@1 (offline) Acc@5 (online) Acc@5 (offline) Checkpoint
Barlow Twins ResNet50 100 βœ”οΈ 67.18 67.23 87.69 87.98 πŸ”—
BYOL ResNet50 100 βœ”οΈ 68.63 68.37 88.80 88.66 πŸ”—
DeepCluster V2 ResNet50 100 βœ”οΈ
DINO ResNet50 100 βœ”οΈ
MoCo V2+ ResNet50 100 βœ”οΈ 62.61 66.84 85.40 87.60 πŸ”—
MoCo V3 ResNet50 100 βœ”οΈ
NNCLR ResNet50 100 βœ”οΈ
ReSSL ResNet50 100 βœ”οΈ
SimCLR ResNet50 100 βœ”οΈ
Simsiam ResNet50 100 βœ”οΈ
SupCon ResNet50 100 βœ”οΈ
SwAV ResNet50 100 βœ”οΈ
VIbCReg ResNet50 100 βœ”οΈ
VICReg ResNet50 100 βœ”οΈ
W-MSE ResNet50 100 βœ”οΈ

Training efficiency for DALI

We report the training efficiency of some methods using a ResNet18 with and without DALI (4 workers per GPU) in a server with an Intel i9-9820X and two RTX2080ti.

Method Dali Total time for 20 epochs Time for a 1 epoch GPU memory (per GPU)
Barlow Twins ❌ 1h 38m 27s 4m 55s 5097 MB
βœ”οΈ 43m 2s 2m 10s (56% faster) 9292 MB
BYOL ❌ 1h 38m 46s 4m 56s 5409 MB
βœ”οΈ 50m 33s 2m 31s (49% faster) 9521 MB
NNCLR ❌ 1h 38m 30s 4m 55s 5060 MB
βœ”οΈ 42m 3s 2m 6s (64% faster) 9244 MB

Note: GPU memory increase doesn't scale with the model, rather it scales with the number of workers.


Citation

If you use solo-learn, please cite our paper:

@article{JMLR:v23:21-1155,
  author  = {Victor Guilherme Turrisi da Costa and Enrico Fini and Moin Nabi and Nicu Sebe and Elisa Ricci},
  title   = {solo-learn: A Library of Self-supervised Methods for Visual Representation Learning},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {56},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v23/21-1155.html}
}