Skip to content

Conversation

@Blaizzy
Copy link
Owner

@Blaizzy Blaizzy commented Jan 23, 2026

No description provided.

…del loading. Update main function to utilize this new argument for trust_remote_code parameter.
…putEmbeddingsFeatures. This change standardizes the output of the get_input_embeddings method, ensuring consistent access to inputs_embeds and attention_mask_4d across the Model classes in the mlx_vlm package.
…gs for grid dimensions. Standardize return type of get_input_embeddings method to InputEmbeddingsFeatures, ensuring consistent access to inputs_embeds across both models.
…turn type consistency. Update __call__ method to utilize the new get_input_embeddings structure, ensuring standardized access to inputs_embeds across the Model and LanguageModel classes.
…putEmbeddingsFeatures. This update standardizes the access to inputs_embeds and attention_mask_4d, ensuring consistency in the Model class implementations within the mlx_vlm package.
…ze InputEmbeddingsFeatures, ensuring consistent handling of inputs_embeds and attention_mask. This update enhances the standardization of the Model class implementations in the mlx_vlm package.
…n mlx_vlm, including aya_vision, deepseek_vl_v2, deepseekocr, fastvlm, florence2, gemma3n, glm4v, glm4v_moe, hunyuan_vl, jina_vlm, lfm2_vl, molmo2, paddleocr_vl, phi3_v, qwen3_vl_moe, and qwen3_omni_moe.
@jrp2014
Copy link

jrp2014 commented Jan 23, 2026

The failure seems to be a black formatting issue

@jrp2014
Copy link

jrp2014 commented Jan 23, 2026

MLX-VLM Compatibility Issues with transformers 5.0.0rc3 - 5 Model Failures

Summary

Testing 38 vision-language models revealed 5 failures in mlx-vlm related to compatibility with transformers 5.0.0rc3. These failures prevent model loading for InternVL, Kimi-VL, Phi-3.5-vision, and Florence-2 models.

NB: mlx-lm now uses transformers 5.0.0rc3

Test Environment:

  • MLX version: 0.30.4.dev20260123+dc81c150
  • MLX-VLM version: 0.3.10 (pc/fix-gemma branch)
  • Transformers version: 5.0.0rc3
  • Tokenizers version: 0.22.2
  • Python version: 3.13.9
  • System: macOS 26.2 (Darwin 25.2.0), Apple M4 Max, 128GB RAM
  • Test date: 2026-01-23 21:39:59 GMT

Issue #1: InternVL Processor Type Incompatibility

Model: mlx-community/InternVL3-14B-8bit

Error Type: TypeError
Stage: Processor Loading
Location: mlx_vlm/models/internvl_chat/processor.py:288

Error Message:

Received a InternVLImageProcessor for argument image_processor, but a ImageProcessingMixin was expected.

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 396, in from_pretrained
    return processor_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/models/internvl_chat/processor.py", line 368, in from_pretrained
    return InternVLChatProcessor(
        image_processor=image_processor, tokenizer=tokenizer
    )
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/models/internvl_chat/processor.py", line 288, in __init__
    super().__init__(image_processor, tokenizer, chat_template=chat_template)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 614, in __init__
    self.check_argument_for_proper_class(attribute_name, arg)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 697, in check_argument_for_proper_class
    raise TypeError(
        f"Received a {type(argument).__name__} for argument {argument_name}, but a {class_name} was expected."
    )
TypeError: Received a InternVLImageProcessor for argument image_processor, but a ImageProcessingMixin was expected.

Root Cause: Transformers 5.0.0rc3 introduced stricter type checking in ProcessorMixin.__init__(). The InternVLChatProcessor is passing an InternVLImageProcessor instance, but transformers now requires it to inherit from ImageProcessingMixin.

HF Cache Info: 15600.9 MB, 18 files

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="mlx-community/InternVL3-14B-8bit",
    lazy=True,
    trust_remote_code=True
)

Issue #2: Missing Private Function Import

Models

  • mlx-community/Kimi-VL-A3B-Thinking-2506-bf16
  • mlx-community/Kimi-VL-A3B-Thinking-8bit

Error Type: ImportError
Stage: Processor Loading (dynamic module import)
Location: HuggingFace cached module processing_kimi_vl.py:25

Error Message:

cannot import name '_validate_images_text_input_order' from 'transformers.processing_utils'

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 387, in from_pretrained
    processor_class = get_class_from_dynamic_module(
        processor_auto_map, pretrained_model_name_or_path, **kwargs
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 583, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 309, in get_class_in_module
    module_spec.loader.exec_module(module)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 1027, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/Users/jrp/.cache/huggingface/modules/transformers_modules/a04b2b044b1795d3e56eeee0d4946ca0c3a9d0fc/processing_kimi_vl.py", line 25, in <module>
    from transformers.processing_utils import ProcessingKwargs, ProcessorMixin, Unpack, _validate_images_text_input_order
ImportError: cannot import name '_validate_images_text_input_order' from 'transformers.processing_utils'

Root Cause: The Kimi-VL processor code (stored in HuggingFace cache) imports a private function _validate_images_text_input_order that was removed or renamed in transformers 5.0.0rc3.

HF Cache Info:

  • Kimi-VL-A3B-Thinking-2506-bf16: 31298.6 MB, 23 files
  • Kimi-VL-A3B-Thinking-8bit: 17004.9 MB, 17 files

Note: This is a model repository issue, but mlx-vlm could add a compatibility shim or version check to provide better error messages.

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="mlx-community/Kimi-VL-A3B-Thinking-8bit",
    lazy=True,
    trust_remote_code=True
)

Issue #3: Missing Configuration File

Model: mlx-community/Phi-3.5-vision-instruct-bf16

Error Type: OSError
Stage: Processor Loading
Location: transformers/utils/hub.py:377

Error Message:

/Users/jrp/.cache/huggingface/hub/models--mlx-community--Phi-3.5-vision-instruct-bf16/snapshots/d8da684308c275a86659e2b36a9189b2f4aec8ea does not appear to have a file named image_processing_phi3_v.py.

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 392, in from_pretrained
    return processor_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1413, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1532, in _get_arguments_from_pretrained
    sub_processor = auto_processor_class.from_pretrained(
        pretrained_model_name_or_path, subfolder=subfolder, **kwargs
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/image_processing_auto.py", line 610, in from_pretrained
    image_processor_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 572, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
        repo_id,
        module_file,
        cache_dir=cache_dir,
        force_download=force_download,
        resume_download=resume_download,
        local_files_only=local_files_only,
        repo_type=repo_type,
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 390, in get_cached_module_file
    resolved_module_file = cached_file(
        pretrained_model_name_or_path,
        module_file,
        cache_dir=cache_dir,
        force_download=force_download,
        resume_download=resume_download,
        proxies=proxies,
        local_files_only=local_files_only,
        token=token,
        revision=revision,
        repo_type=repo_type,
        _commit_hash=_commit_hash,
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/utils/hub.py", line 276, in cached_file
    file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/utils/hub.py", line 377, in cached_files
    raise OSError(
        f"{path_or_repo_id} does not appear to have a file named {filename}. Checkout "
        f"'{hf_hub_url(path_or_repo_id, filename, repo_type=repo_type, revision=revision)}/tree/{revision}' for available files."
    )
OSError: /Users/jrp/.cache/huggingface/hub/models--mlx-community--Phi-3.5-vision-instruct-bf16/snapshots/d8da684308c275a86659e2b36a9189b2f4aec8ea does not appear to have a file named image_processing_phi3_v.py.

Root Cause: The model repository is missing the required image_processing_phi3_v.py file. This appears to be an incomplete model conversion or upload issue.

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="mlx-community/Phi-3.5-vision-instruct-bf16",
    lazy=True,
    trust_remote_code=True
)

Note: The 4-bit quantized version microsoft/Phi-3.5-vision-instruct works correctly, suggesting this is specific to the bf16 conversion.


Issue #4: Tokenizer Attribute Error

Model: prince-canuma/Florence-2-large-ft

Error Type: AttributeError
Stage: Processor Loading
Location: HuggingFace cached module processing_florence2.py:87

Error Message:

RobertaTokenizer has no attribute additional_special_tokens. Did you mean: 'add_special_tokens'?

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 392, in from_pretrained
    return processor_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1414, in from_pretrained
    return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict
    processor = cls(*args, **valid_kwargs)
  File "/Users/jrp/.cache/huggingface/modules/transformers_modules/microsoft/Florence_hyphen_2_hyphen_large_hyphen_ft/4a12a2b54b7016a48a22037fbd62da90cd566f2a/processing_florence2.py", line 87, in __init__
    tokenizer.additional_special_tokens + \
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1326, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: RobertaTokenizer has no attribute additional_special_tokens. Did you mean: 'add_special_tokens'?

Root Cause: The Florence-2 processor code (in HuggingFace cache) references tokenizer.additional_special_tokens which was removed or changed in transformers 5.0.0rc3. The error message suggests using add_special_tokens instead.

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="prince-canuma/Florence-2-large-ft",
    lazy=True,
    trust_remote_code=True
)

Summary of Root Causes

All 5 failures are related to transformers 5.0.0rc3 breaking changes:

  1. Stricter type checking - ImageProcessingMixin enforcement (InternVL)
  2. Removed private APIs - _validate_images_text_input_order (Kimi-VL)
  3. Missing config files - Model repository issue (Phi-3.5-bf16)
  4. Attribute renaming - additional_special_tokens removed (Florence-2)

Recommendations

Immediate Actions

  1. Add transformers version check in mlx_vlm/utils.py:

    import transformers
    from packaging import version
    
    if version.parse(transformers.__version__) >= version.parse("5.0.0"):
        logger.warning("transformers 5.0+ detected. Some models may fail to load.")
  2. Update InternVL processor (mlx_vlm/models/internvl_chat/processor.py):

    • Ensure InternVLImageProcessor inherits from ImageProcessingMixin
    • Or add compatibility layer for transformers 5.0+
  3. Document compatibility in README:

    • Specify tested transformers versions
    • Add known incompatibilities section

Long-term Solutions

  1. Pin transformers version in requirements.txt until compatibility is resolved:

    transformers>=4.40.0,<5.0.0
    
  2. Add CI testing for multiple transformers versions

  3. Contact model authors about missing files (Phi-3.5-bf16) and outdated processor code (Kimi-VL, Florence-2)


Test Configuration

Command:

python -m check_models --trust-remote-code --verbose

Test image: 8640x5400 pixels (46.7 MPixels)
Parameters: max_tokens=500, temperature=0.1, timeout=300s

Success rate: 30/38 models (78.9%) completed successfully


Additional Context

Full logs available:

https://github.com/jrp2014/check_models/tree/main/src/output

  • src/output/check_models.log (272KB, 2157 lines)
  • src/output/results.md (87KB)
  • src/output/environment.log (16KB)

Working models include:

  • HuggingFaceTB/SmolVLM-Instruct
  • Qwen/Qwen3-VL-2B-Instruct
  • meta-llama/Llama-3.2-11B-Vision-Instruct
  • microsoft/Phi-3.5-vision-instruct (4-bit version)
  • And 26 others

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants