`video_utils.group_videos_by_shape` does not consider video length

### System Info

transformers 4.52.3

### Who can help?

@zucchini-nlp @hmellor

### Information

- [ ] The official example scripts
- [x] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [x] My own task or dataset (give details below)

### Reproduction

The utility function `transformers.video_utils.group_videos_by_shape` fails to handle videos with the same image shape but varying length.

Example:
```py
import torch
from transformers.video_utils import group_videos_by_shape

video_1 = torch.zeros((4, 3, 336, 336))
video_2 = torch.zeros((5, 3, 336, 336))
grouped_videos, grouped_videos_index = group_videos_by_shape([video_1, video_2])
```

Discovered in https://github.com/vllm-project/vllm/pull/18678

Error log: https://buildkite.com/vllm/fastcheck/builds/25100/steps?jid=0197076f-fbbf-45c4-968f-6d6f154f4af9

### Expected behavior

The videos should be grouped by the full shape, not just `shape[-2::]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`video_utils.group_videos_by_shape` does not consider video length #38352

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

video_utils.group_videos_by_shape does not consider video length #38352

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`video_utils.group_videos_by_shape` does not consider video length #38352