Skip to content

Remove slow tokenizer break pipeline loading.Β #42540

@jiqing-feng

Description

@jiqing-feng

System Info

latest main branch transformers

Who can help?

@itazap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import pipeline

pipeline("document-question-answering", model="impira/layoutlm-document-qa")

output:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jiqing/transformers/src/transformers/pipelines/__init__.py", line 1085, in pipeline
    return pipeline_class(model=model, task=task, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/jiqing/transformers/src/transformers/pipelines/document_question_answering.py", line 150, in __init__
    raise ValueError(
ValueError: `DocumentQuestionAnsweringPipeline` requires a fast tokenizer, but a slow tokenizer (`RobertaTokenizer`) is provided.

Expected behavior

Regression PR #40936

cc @zucchini-nlp

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions