Skip to content

Commit

Permalink
[README] add CLIP paper.
Browse files Browse the repository at this point in the history
  • Loading branch information
YannDubs committed Jul 6, 2021
1 parent 010cbb3 commit fc39639
Show file tree
Hide file tree
Showing 14 changed files with 118 additions and 50 deletions.
49 changes: 42 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ If you want to use our compressor directly the easiest is to use the model from
```

Using pytorch`>1.7.1` : CLIP forces pytorch version `1.7.1`, this is because it needs this version to use JIT. If you don't need JIT (no JIT by default) you can alctually use more recent versions of torch and torchvision `pip install -U torch torchvision`. Make sure to update after having isntalled CLIP.

----------------------
</details>

```python
Expand Down Expand Up @@ -101,15 +103,15 @@ If your goal is to look at a minimal version of the code to simply understand wh

## Results from the paper

We provide scripts to essentially replicate some results from the paper. The exact results will be a little different as we simplified and cleaned some of the code to help readability.
We provide scripts to essentially replicate some results from the paper. The exact results will be a little different as we simplified and cleaned some of the code to help readability. All scripts can be found in `bin` and run using the command `bin/*/<experiment>.sh`.


<details>
<summary><b>Installation details</b></summary>

0. Clone repository
1. Install [PyTorch](https://pytorch.org/) >= 1.7
2. `pip install -r requirements.txt`
1. Clone repository
2. Install [PyTorch](https://pytorch.org/) >= 1.7
3. `pip install -r requirements.txt`

### Other installation
- For the bare minimum packages: use `pip install -r requirements_mini.txt` instead.
Expand All @@ -131,8 +133,13 @@ if not _root_logger.hasHandlers():

To test your installation and that everything works as desired you can run `bin/test.sh`, which will run an epoch of BICNE and VIC on MNIST.

----------------------

</details>

<details>
<summary><b>Scripts details</b></summary>

All scripts can be found in `bin` and run using the command `bin/*/<experiment>.sh`. This will save all results, checkpoints, logs... The most important results (including summary resutls and figures) will be saved at `results/exp_<experiment>`. Most important are the summarized metrics `results/exp_<experiment>*/summarized_metrics_merged.csv` and any figures `results/exp_<experiment>*/*.png`.

The key experiments that that do not require very large compute are:
Expand All @@ -147,11 +154,19 @@ Generally speaking you can change any of the parameters either directly in `conf
If you are using [Slurm](https://slurm.schedmd.com/documentation.html) you can submit directly the script on servers by adding a config file under `conf/slurm/<myserver>.yaml`, and then running the script as `bin/*/<experiment>.sh -s <myserver>`. For example configurations files for slurm see `conf/slurm/vector.yaml` or `conf/slurm/learnfair.yaml`. For more information check the documentation from [submitit's plugin](https://hydra.cc/docs/plugins/submitit_launcher) which we are using.


----------------------

</details>


### VIC/VAE on rotation invariant Banana

The following figures are saved automatically at `restults/exp_banana_viz_VIC/**/quantization.png` after running `bin/banana/banana_viz_VIC.sh`.
Command:
```bash
bin/banana/banana_viz_VIC.sh
```

On the left we see the quantization of the Banana distribution by a standard compressor (called `VAE` in code but VC in paper). On the right, by our (rotation) invariant compressor (`VIC`).
The following figures are saved automatically at `results/exp_banana_viz_VIC/**/quantization.png`. On the left we see the quantization of the Banana distribution by a standard compressor (called `VAE` in code but VC in paper). On the right, by our (rotation) invariant compressor (`VIC`).


<p float="left" align="middle">
Expand All @@ -161,13 +176,33 @@ On the left we see the quantization of the Banana distribution by a standard com

### VIC/VAE on augmentend MNIST

The following figure is saved automatically at `restults/exp_augmnist_viz_VIC/**/rec_imgs.png` after running `bin/banana/augmnist_viz_VIC.sh`. It shows source augmented MNIST images as well as the reconstructions using our invariant compressor.
Command:
```bash
bin/banana/augmnist_viz_VIC.sh
```

The following figure is saved automatically at `results/exp_augmnist_viz_VIC/**/rec_imgs.png`. It shows source augmented MNIST images as well as the reconstructions using our invariant compressor.

![Invariant compression of augmented MNIST](/results/exp_augmnist_viz_VIC/datafeat_mnist_aug/feat_neural_rec/dist_VIC/enc_resnet18/rate_H_hyper/optfeat_AdamW_lr1.0e-03_w1.0e-05/schedfeat_expdecay100/zdim_128/zs_1/beta_1.0e-01/seed_123/addfeat_None/rec_imgs.png
)


### CLIP compressor


Command:
```bash
bin/clip/main_small.sh
```

The following table comes directly from the results which are automatically saved at `results/exp_clip_bottleneck_linear_eval/**/datapred_*/**/results_predictor.csv`. It shows the result of compression from our CLIP compressor on many datasets.

| | Cars196 | STL10 | Caltech101 | Food101 | PCam | Pets37 | CIFAR10 |
|---------------|:-------:|:-----:|:----------:|:-------:|:----:|:------:|:-------:|
| Rate [bits] | 1468 | 1344 | 1341 | 1269 | 1491 | 1211 | 1408 |
| Test Acc. [%] | 79.9 | 98.7 | 93.7 | 83.6 | 81.1 | 88.3 | 94.8 |

Note: ImageNet is too large for training a SVM using SKlearn. You need to run MLP evaluation with `bin/clip/clip_bottleneck_mlp_eval`.


## Cite
Expand Down
6 changes: 3 additions & 3 deletions bin/clip/clip_bottleneck_linear_eval.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ source `dirname $0`/../utils.sh

SCRIPT=`realpath $0`
SCRIPTPATH=`dirname $SCRIPT`
pretrained_path="$SCRIPTPATH"/../../hub
pretrained_path="$SCRIPTPATH"/../../pretrained/clip

# define all the arguments modified or added to `conf`. If they are added use `+`
kwargs="
Expand All @@ -28,11 +28,11 @@ $add_kwargs
"

kwargs_multi="
data@data_pred=stl10,cars196,caltech101,food101,pcam,pets37,cifar10,cifar100,imagenet
data@data_pred=stl10,cars196,caltech101,food101,pcam,pets37,cifar10,cifar100
"

if [ "$is_plot_only" = false ] ; then
for beta in "1e-01" "5e-02" "1e-02"
for beta in "5e-02"
do

python utils/Z_linear_eval.py $kwargs $kwargs_multi featurizer.loss.beta=$beta paths.pretrained.load=$pretrained_path/beta$beta $kwargs_dep -m &
Expand Down
4 changes: 2 additions & 2 deletions bin/clip/clip_bottleneck_mlp_eval.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ source `dirname $0`/../utils.sh

SCRIPT=`realpath $0`
SCRIPTPATH=`dirname $SCRIPT`
pretrained_path="$SCRIPTPATH"/../../hub
pretrained_path="$SCRIPTPATH"/../../pretrained/clip


# define all the arguments modified or added to `conf`. If they are added use `+`
Expand Down Expand Up @@ -60,7 +60,7 @@ seed=int(interval(0,10))
if [ "$is_plot_only" = false ] ; then
for data in "stl10" "imagenet" "cars196" "caltech101" "food101" "pcam" "pets37" "cifar10" "cifar100"
do
for beta in "1e-01" "5e-02" "1e-02"
for beta in "5e-02"
do

python "$main" +hydra.job.env_set.WANDB_NOTES="\"${notes}\"" $kwargs $kwargs_multi data@data_pred=$data featurizer.loss.beta=$beta paths.pretrained.load=$pretrained_path/beta$beta hydra.sweeper.study_name=$data_$beta -m &
Expand Down
4 changes: 2 additions & 2 deletions bin/clip/clip_bottleneck_pretrain.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ source `dirname $0`/../utils.sh

SCRIPT=`realpath $0`
SCRIPTPATH=`dirname $SCRIPT`
pretrained_path="$SCRIPTPATH"/../../hub
pretrained_path="$SCRIPTPATH"/../../pretrained/clip

# define all the arguments modified or added to `conf`. If they are added use `+`
kwargs="
Expand All @@ -29,7 +29,7 @@ $add_kwargs
kwargs_multi=""

if [ "$is_plot_only" = false ] ; then
for beta in "1e-01" "5e-02" "1e-02"
for beta in "5e-02" # add more values of beta if needed
do

python "$main" +hydra.job.env_set.WANDB_NOTES="\"${notes}\"" $kwargs $kwargs_multi featurizer.loss.beta=$beta paths.pretrained.save=$pretrained_path/beta$beta -m &
Expand Down
32 changes: 30 additions & 2 deletions bin/clip/clip_hub.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

experiment="clip_hub"
notes="
**Goal**: Save all pretrained models to pytorch hub.
**Goal**: Save all pretrained (factorized) models to pytorch hub.
"

# parses special mode for running the script
Expand All @@ -12,14 +12,42 @@ SCRIPT=`realpath $0`
SCRIPTPATH=`dirname $SCRIPT`
pretrained_path="$SCRIPTPATH"/../../hub

kwargs="
experiment=$experiment
timeout=$time
encoder.z_dim=512
is_only_feat=True
data@data_feat=coco
checkpoint@checkpoint_feat=bestValLoss
trainer.max_epochs=30
featurizer=bottleneck_clip_lossyZ_factorized
$add_kwargs
"

# TRAIN FACTORIZED MODEL
kwargs_multi=""

if [ "$is_plot_only" = false ] ; then
for beta in "1e-01" "5e-02" "1e-02"
do

python "$main" +hydra.job.env_set.WANDB_NOTES="\"${notes}\"" $kwargs $kwargs_multi featurizer.loss.beta=$beta paths.pretrained.save=$pretrained_path/beta$beta -m &

sleep 3

done
fi

wait


# define all the arguments modified or added to `conf`. If they are added use `+`
kwargs="
experiment=$experiment
timeout=$time
encoder.z_dim=512
data@data_feat=coco
featurizer=bottleneck_clip_lossyZ
featurizer=bottleneck_clip_lossyZ_factorized
checkpoint@checkpoint_pred=bestValLoss
featurizer.is_train=false
evaluation.communication.ckpt_path=null
Expand Down
3 changes: 1 addition & 2 deletions bin/clip/clip_raw_linear_eval.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ source `dirname $0`/../utils.sh

SCRIPT=`realpath $0`
SCRIPTPATH=`dirname $SCRIPT`
pretrained_path="$SCRIPTPATH"/../../hub

# define all the arguments modified or added to `conf`. If they are added use `+`
kwargs="
Expand All @@ -29,7 +28,7 @@ $add_kwargs
"

kwargs_multi="
data@data_pred=stl10,cars196,stl10,caltech101,food101,pcam,pets37,cifar10,cifar100,imagenet
data@data_pred=stl10,cars196,stl10,caltech101,food101,pcam,pets37,cifar10,cifar100
"


Expand Down
10 changes: 5 additions & 5 deletions bin/clip/main_all.sh → bin/clip/main.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,26 @@ echo "Ensures that all data is downloaded"
wait

### OUR CLIP ###
echo "Pretrains the 3 different clips (for different values of beta)"
echo "Pretrains our CLIP compressor."
`dirname $0`/clip_bottleneck_pretrain.sh "$@"

wait

echo "Evaluates the pretrained CLIP models with linear classifiers"
echo "Evaluates our pretrained CLIP compressor with linear classifiers"
`dirname $0`/clip_bottleneck_linear_eval.sh "$@"

wait

echo "Evaluates the pretrained CLIP models with MLP classifiers"
echo "Evaluates our pretrained CLIP compressor with MLP classifiers"
`dirname $0`/clip_bottleneck_mlp_eval.sh "$@"

wait

### BASELINE CLIP ###
echo "Evaluates the pretrained CLIP models with linear classifiers"
echo "Evaluates raw pretrained CLIP model with linear classifiers"
`dirname $0`/clip_raw_linear_eval.sh "$@"

wait

echo "Evaluates the pretrained CLIP models with MLP classifiers"
echo "Evaluates raw pretrained CLIP model with MLP classifiers"
`dirname $0`/clip_raw_mlp_eval.sh "$@"
22 changes: 9 additions & 13 deletions bin/clip/main_small.sh
Original file line number Diff line number Diff line change
@@ -1,22 +1,18 @@
#!/usr/bin/env bash


# # Ensures that all data is downloaded
# Ensures that all data is downloaded
# echo "Ensures that all data is downloaded."
# bin/clip/download_data.sh
# `dirname $0`/../clip/download_data.sh "$@"

# wait
wait

# ### OUR CLIP ###
echo "Pretrains the 3 different clips (for different values of beta)."
# Pretrain our CLIP
echo "Pretrains the CLIP compressor."
`dirname $0`/clip_bottleneck_pretrain.sh "$@"

# wait
wait

# echo "Evaluates the pretrained CLIP models with linear classifiers"
# bin/clip/clip_bottleneck_linear_eval.sh "$@"

# wait

# echo "Evaluates the pretrained CLIP models with linear classifiers"
# bin/clip/clip_raw_linear_eval.sh "$@"
# Evaluate our CLIP
echo "Evaluates the pretrained CLIP compressor with linear classifiers"
`dirname $0`/clip_bottleneck_linear_eval.sh "$@"
2 changes: 1 addition & 1 deletion config/featurizer/bottleneck_clip_lossyZ.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ defaults:
- neural_feat@featurizer
- override /architecture@encoder: clip
- override /distortion: lossy_Z
- override /rate: H_factorized
- override /rate: H_hyper
- override /finetune: freezer
- override /scheduler@scheduler_feat: unifmultistep1000
- override /scheduler@scheduler_coder: unifmultistep1000
Expand Down
5 changes: 5 additions & 0 deletions config/featurizer/bottleneck_clip_lossyZ_factorized.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# @package _global_
defaults:
- bottleneck_clip_lossyZ@featurizer
- override /rate: H_factorized

2 changes: 1 addition & 1 deletion lossyless/learnable_compressors.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ def predict_step(self, batch, batch_idx, dataloader_idx=None):
y = y[0] # only return the real label assumed to be first

x_hat = self(x)
return x_hat.cpu(), y
return x_hat.cpu(), y.cpu()

def predict(self, *args, **kwargs): # TODO remove in newer version of lightning
return self.predict_step(*args, **kwargs)
Expand Down
2 changes: 1 addition & 1 deletion lossyless/predictors.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ def predict_step(self, batch, batch_idx, dataloader_idx=None):
"""
x, y = batch
y_hat = self(x)
return y_hat, y
return y_hat.cpu(), y.cpu()

def predict(self, *args, **kwargs): # TODO remove in newer version of lightning
return self.predict_step(*args, **kwargs)
Expand Down
17 changes: 15 additions & 2 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@
from pytorch_lightning.callbacks.finetuning import BaseFinetuning
from pytorch_lightning.loggers import CSVLogger, WandbLogger
from pytorch_lightning.plugins import DDPSpawnPlugin
from pytorch_lightning.utilities import rank_zero_warn
from pytorch_lightning.utilities import parsing, rank_zero_warn
from pytorch_lightning.utilities.cloud_io import load as pl_load
from pytorch_lightning.utilities.exceptions import MisconfigurationException
from utils.data import get_datamodule
Expand Down Expand Up @@ -486,7 +486,20 @@ def get_trainer(cfg, module, is_featurizer):

def placeholder_fit(trainer, module, datamodule):
"""Necessary setup of trainer before testing if you don't fit it."""
trainer.train_loop.setup_fit(module, None, None, datamodule)

# links data to the trainer
trainer.data_connector.attach_data(module, datamodule=datamodule)

# clean hparams
if hasattr(module, "hparams"):
parsing.clean_namespace(module.hparams)

# check that model is configured correctly
trainer.config_validator.verify_loop_configurations(module)

# attach model log function to callback
trainer.callback_connector.attach_model_logging_functions(module)

trainer.model = module


Expand Down
10 changes: 1 addition & 9 deletions utils/Z_linear_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,21 +42,13 @@

@hydra.main(config_path=f"{MAIN_DIR}/config", config_name="clip_linear")
def main(cfg):

analyser = PretrainedAnalyser()
logger.info(f"Collecting the data ..")
stage = "predictor"

analyser.collect_data(cfg, **cfg.load_pretrained.collect_data)

# DEV (not working)
path = Path(analyser.cfgs[stage].paths.results)
path.mkdir(parents=True, exist_ok=True)
with open(path / "dir.txt", "a") as the_file:
the_file.write(os.getcwd()) # know where you are for debugging
results_file = path / RESULTS_FILE.format(stage=stage)
if results_file.is_file():
return

Z_train = analyser.datamodules["predictor"].train_dataset.X
Y_train = analyser.datamodules["predictor"].train_dataset.Y
Z_val = analyser.datamodules["predictor"].val_dataset.X
Expand Down

0 comments on commit fc39639

Please sign in to comment.