ClipScore errors on captions with more than 77 tokens #2000

schopra8 · 2023-08-16T06:56:02Z

🐛 Bug

If you run CLIPScore between an image and a caption, where the caption has more than 77 tokens (longer than the max string than CLIP can process) -- the clip score errors.

To Reproduce

Compute CLIPScore between a caption with 77+ tokens and an image.

Code sample

metric = CLIPScore(model_name_or_path="openai/clip-vit-base-patch32")
metric.to('cuda')
clip_score = metric(image_tensor, caption)

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/scripts/compute_clip_scores.py", line 125, in <module>
    compute_clip_scores(response=response,
  File "/home/user/scripts/compute_clip_scores.py", line 87, in compute_clip_scores
    clip_score = metric(image_tensor, caption)
  File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 288, in forward
    self._forward_cache = self._forward_full_state_update(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 302, in _forward_full_state_update
    self.update(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 456, in wrapped_func
    raise err
  File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/metric.py", line 446, in wrapped_func
    update(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/multimodal/clip_score.py", line 123, in update
    score, n_samples = _clip_score_update(images, text, self.model, self.processor)
  File "/home/user/.local/lib/python3.8/site-packages/torchmetrics/functional/multimodal/clip_score.py", line 69, in _clip_score_update
    txt_features = model.get_text_features(
  File "/home/user/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 1017, in get_text_features
    text_outputs = self.text_model(
  File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 730, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "/home/user/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/.local/lib/python3.8/site-packages/transformers/models/clip/modeling_clip.py", line 230, in forward
    embeddings = inputs_embeds + position_embeddings
RuntimeError: The size of tensor a (138) must match the size of tensor b (77) at non-singleton dimension 1

Expected behavior

Present a warning to the user and truncate the caption so that the metric can be computed on the first 77 tokens of the provided caption

Environment

TorchMetrics version (and how you installed TM, e.g. conda, pip, build from source): 1.0.3, pip
Python & PyTorch Version (e.g., 1.0): Python 3.8.10, PyTorch 2.0.1+cu118
Any other relevant information such as OS (e.g., Linux): Linux

The text was updated successfully, but these errors were encountered:

github-actions · 2023-08-16T06:56:40Z

Hi! thanks for your contribution!, great first issue!

schopra8 added bug / fix Something isn't working help wanted Extra attention is needed labels Aug 16, 2023

SkafteNicki mentioned this issue Aug 16, 2023

Warning for CLIP Score on long captions #2001

Merged

4 tasks

SkafteNicki added this to the v1.1.0 milestone Aug 17, 2023

Borda closed this as completed in #2001 Aug 19, 2023

Borda added topic: Image v1.0.x labels Aug 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ClipScore errors on captions with more than 77 tokens #2000

ClipScore errors on captions with more than 77 tokens #2000

schopra8 commented Aug 16, 2023 •

edited

Loading

github-actions bot commented Aug 16, 2023

ClipScore errors on captions with more than 77 tokens #2000

ClipScore errors on captions with more than 77 tokens #2000

Comments

schopra8 commented Aug 16, 2023 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

github-actions bot commented Aug 16, 2023

schopra8 commented Aug 16, 2023 •

edited

Loading