You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add siglip loss function
* Update docs
* Enable training tests
[experimental] enable GC training tests as it has worked for my own data
* Remove test_training* overrides to enable training tests
[run_slow] siglip
* Skip training tests for Siglip text model and ImageClassificationModel
[run_slow] siglip
* Skip GC training tests for SiglipForImageClassification
* Explicitly skip training tests for SiglipVisionModel
Add skip reason for training tests for SiglipTextModel
* Remove copied from to fix CI
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/siglip.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,7 @@ The abstract from the paper is the following:
27
27
## Usage tips
28
28
29
29
- Usage of SigLIP is similar to [CLIP](clip). The main difference is the training loss, which does not require a global view of all the pairwise similarities of images and texts within a batch. One needs to apply the sigmoid activation function to the logits, rather than the softmax.
30
-
- Training is not yet supported. If you want to fine-tune SigLIP or train from scratch, refer to the loss function from [OpenCLIP](https://github.com/mlfoundations/open_clip/blob/73ad04ae7fb93ede1c02dc9040a828634cb1edf1/src/open_clip/loss.py#L307), which leverages various `torch.distributed` utilities.
30
+
- Training is supported but does not use `torch.distributed` utilities which may limit the scalability of batch size. However, DDP and FDSP works on single-node multi-gpu setup.
31
31
- When using the standalone [`SiglipTokenizer`] or [`SiglipProcessor`], make sure to pass `padding="max_length"` as that's how the model was trained.
32
32
- To get the same results as the pipeline, a prompt template of "This is a photo of {label}." should be used.
Copy file name to clipboardExpand all lines: src/transformers/models/siglip/modeling_siglip.py
+6-1
Original file line number
Diff line number
Diff line change
@@ -1234,7 +1234,12 @@ def forward(
1234
1234
1235
1235
loss=None
1236
1236
ifreturn_loss:
1237
-
raiseNotImplementedError("SigLIP loss to be implemented")
1237
+
# Adapted from https://github.com/google-research/big_vision/blob/01edb81a4716f93a48be43b3a4af14e29cdb3a7f/big_vision/trainers/proj/image_text/siglip.py#L287
0 commit comments