This Repo compares different deep image embedding methods with the goal to achieve good general embeddings for images given a small amount of training data.
This Repo was created for an assignment in a deep vision course at the OTH-Amberg-Weiden. Therefore a report is included.
Following datasets were used:
- Tiny ImageNet (Download) for comparing the different embedding methods.
- Internal and External Parts of Cars for the final test set.
The notebooks expect the datasets to be in the root of the repo.
The Backbones-Notebook compares the following backbones.
Results:
Backbone | F1-Score |
---|---|
ResNet50 | 0.664 |
EfficientNetV2_L | 0.540 |
MobilNetV3 | 0.367 |
DenseNet169 | 0.612 |
ViT | 0.893 |
Swin | 0.934 |
The Losses-Notebook compares the following Loss-Functions.
Results:
Loss | F1-Score |
---|---|
ContrastiveLoss | 0.650 |
TripletLoss | 0.660 |
SupConLoss | 0.709 |
SNRLoss | 0.685 |
NTXentLoss | 0.618 |
The Embedding Size-Notebook compares different Embedding-Sizes.
Results:
Embedding Size | F1-Score |
---|---|
64 | 0.654 |
128 | 0.683 |
256 | 0.712 |
512 | 0.719 |
1024 | 0.724 |
2048 | 0.731 |
The Dataset Size-Notebook compares different Train-Sample-Sizes for each class in the dataset.
Results:
Samples per Class | F1-Score |
---|---|
10 | 0.223 |
20 | 0.280 |
30 | 0.369 |
50 | 0.443 |
80 | 0.507 |
100 | 0.520 |
200 | 0.632 |
400 | 0.703 |
The Augmentation Factor-Notebook compares different augmentation factors for a small dataset with 20 images per class.
Results:
Factor | F1-Score |
---|---|
1x (Baseline) | 0.306 |
2x | 0.352 |
4x | 0.437 |
8x | 0.518 |
16x | 0.569 |
The Augmentation Methods-Notebook compares different auto-augmentation methods integrated in pytorch.
Results:
Method | F1-Score |
---|---|
Baseline | 0.289 |
AutoAugment | 0.392 |
RandAugment | 0.436 |
TrivialAugmentWide | 0.441 |
In the Zero Shot-Notebook we test the capabilities of a SWIN-Network finetuned on the "Tiny-ImageNet"-Dataset to embed images from the "Internal and External Parts of Cars"-Dataset.
Results:
F1-Score |
---|
0.865 |
In the Final-Notebook we try to finetune a Network on 20 images of 4 classes of the "Internal and External Parts of Cars"-Dataset and perform normal and zero-shot detection on all 8 classes with 230 images per class.
Results:
Mode | F1-Score |
---|---|
Normal | 0.995 |
Zero-Shot | 0.975 |