Skip to content

Whisper v3 dependency issue #28156

Closed
Closed
@lionsheep0724

Description

@lionsheep0724

System Info

  • transformers version: transformers-4.37.0.dev0 (installed via pip install --upgrade git+https://github.com/huggingface/transformers.git accelerate datasets[audio], which instructed in here
  • Platform: Windows 10, WSL
  • Python version: 3.10

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline


device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_path = f"./models/whisper-large-v3"

model = AutoModelForSpeechSeq2Seq.from_pretrained(
    model_path, torch_dtype=torch_dtype, low_cpu_mem_usage=True, use_safetensors=True
)
model.to(device)

processor = AutoProcessor.from_pretrained(model_path)

pipe = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    max_new_tokens=128,
    chunk_length_s=30,
    batch_size=16,
    return_timestamps=True,
    torch_dtype=torch_dtype,
    device=device,
)

Expected behavior

  • I'm trying to load pretrained whisper-large-v3 model but I guess there is dependency issue in transformers (transformers-4.37.0.dev0)
  • I got an error as follows. ImportError: tokenizers>=0.11.1,!=0.11.3,<0.14 is required for a normal functioning of this module, but found tokenizers==0.15.0.
  • I guess transformers(4.37.0.dev0) and whisper-v3 depends on tokenizers under 0.15, but installed one through pip command in official hf-whisper page is 0.15.
  • When I install lower version of tokenizers, ValueError: Non-consecutive added token ‘<|0.02|>’ found. Should have index 50365 but has index 50366 in saved vocabulary. error occurrs.
  • I'm confused which tokenizers version I need to install.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions