Skip to content

Aria processor does not work with images #35768

Closed
@DarkLight1337

Description

@DarkLight1337

System Info

  • transformers version: 4.48.0
  • Platform: Linux-5.4.0-174-generic-x86_64-with-glibc2.31
  • Python version: 3.9.20
  • Huggingface_hub version: 0.26.2
  • Safetensors version: 0.4.5
  • Accelerate version: 1.0.1
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA A10

Who can help?

@zucchini-nlp @aymeric-roucher

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

>>> import numpy as np
>>> from transformers import AutoProcessor
>>> processor = AutoProcessor.from_pretrained("rhymes-ai/Aria")
>>> processor(text="<|img|>", images=[np.zeros((3, 224, 224))])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/models/aria/processing_aria.py", line 129, in __call__
    sample = sample.replace(self.tokenizer.image_token, self.tokenizer.image_token * num_crops)
  File "/home/cyrus/miniconda3/envs/vllm/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1108, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: LlamaTokenizerFast has no attribute image_token

The processor for Aria model doesn't work because of missing image_token attribute.

Expected behavior

Similar to other models, we should add image_token directly to the processor and use that instead of referring to tokenizer.image_token.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions