Skip to content

video_utils.group_videos_by_shape does not consider video length #38352

Closed
@DarkLight1337

Description

@DarkLight1337

System Info

transformers 4.52.3

Who can help?

@zucchini-nlp @hmellor

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The utility function transformers.video_utils.group_videos_by_shape fails to handle videos with the same image shape but varying length.

Example:

import torch
from transformers.video_utils import group_videos_by_shape

video_1 = torch.zeros((4, 3, 336, 336))
video_2 = torch.zeros((5, 3, 336, 336))
grouped_videos, grouped_videos_index = group_videos_by_shape([video_1, video_2])

Discovered in vllm-project/vllm#18678

Error log: https://buildkite.com/vllm/fastcheck/builds/25100/steps?jid=0197076f-fbbf-45c4-968f-6d6f154f4af9

Expected behavior

The videos should be grouped by the full shape, not just shape[-2::]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions