Description
System Info
transformers 4.52
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
In the new version of transformers, _check_special_mm_tokens
is being called inside AyaVisionProcessor
. However, _check_special_mm_tokens
assumes that the image placeholder <image>
can be represented as a single token. This is not the case for Aya Vision 8B which encodes <image>
into [35, 6504, 37]
. As a result, the validation always fails whenever an image is passed.
I discovered this issue when attempting to update transformers version in vLLM: vllm-project/vllm#18678
Error log: https://buildkite.com/vllm/fastcheck/builds/25098/steps?sid=019706c6-1a33-4922-9358-d72dfc525fe2 https://buildkite.com/vllm/fastcheck/builds/25098/steps?sid=019706c6-1a35-46ac-aa2b-8d6d811109fd
Expected behavior
_check_special_mm_tokens
should handle the case where the modality text takes up multiple tokens.