You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you run CLIPScore between an image and a caption, where the caption has more than 77 tokens (longer than the max string than CLIP can process) -- the clip score errors.
To Reproduce
Compute CLIPScore between a caption with 77+ tokens and an image.
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/user/scripts/compute_clip_scores.py", line 125, in <module>
compute_clip_scores(response=response,
File "/home/user/scripts/compute_clip_scores.py", line 87, in compute_clip_scores
clip_score = metric(image_tensor, caption)
File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 288, in forward
self._forward_cache = self._forward_full_state_update(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 302, in _forward_full_state_update
self.update(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 456, in wrapped_func
raise err
File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 446, in wrapped_func
update(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/multimodal/clip_score.py", line 123, in update
score, n_samples = _clip_score_update(images, text, self.model, self.processor)
File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/functional/multimodal/clip_score.py", line 69, in _clip_score_update
txt_features = model.get_text_features(
File "/home/user/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 1017, in get_text_features
text_outputs = self.text_model(
File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 730, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/user/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 230, in forward
embeddings = inputs_embeds + position_embeddings
RuntimeError: The size of tensor a (138) must match the size of tensor b (77) at non-singleton dimension 1
Expected behavior
Present a warning to the user and truncate the caption so that the metric can be computed on the first 77 tokens of the provided caption
Environment
TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): 1.0.3, pip
Python & PyTorch Version (e.g., 1.0): Python 3.8.10, PyTorch 2.0.1+cu118
Any other relevant information such as OS (e.g., Linux): Linux
The text was updated successfully, but these errors were encountered:
🐛 Bug
If you run CLIPScore between an image and a caption, where the caption has more than 77 tokens (longer than the max string than CLIP can process) -- the clip score errors.
To Reproduce
Compute CLIPScore between a caption with 77+ tokens and an image.
Code sample
Expected behavior
Present a warning to the user and truncate the caption so that the metric can be computed on the first 77 tokens of the provided caption
Environment
conda
,pip
, build from source): 1.0.3, pipThe text was updated successfully, but these errors were encountered: