Add text prefill and input embeddings obj #681

Blaizzy · 2026-01-23T10:56:27Z

No description provided.

…-gemma

…del loading. Update main function to utilize this new argument for trust_remote_code parameter.

…putEmbeddingsFeatures. This change standardizes the output of the get_input_embeddings method, ensuring consistent access to inputs_embeds and attention_mask_4d across the Model classes in the mlx_vlm package.

…gs for grid dimensions. Standardize return type of get_input_embeddings method to InputEmbeddingsFeatures, ensuring consistent access to inputs_embeds across both models.

…turn type consistency. Update __call__ method to utilize the new get_input_embeddings structure, ensuring standardized access to inputs_embeds across the Model and LanguageModel classes.

…putEmbeddingsFeatures. This update standardizes the access to inputs_embeds and attention_mask_4d, ensuring consistency in the Model class implementations within the mlx_vlm package.

…ze InputEmbeddingsFeatures, ensuring consistent handling of inputs_embeds and attention_mask. This update enhances the standardization of the Model class implementations in the mlx_vlm package.

…n mlx_vlm, including aya_vision, deepseek_vl_v2, deepseekocr, fastvlm, florence2, gemma3n, glm4v, glm4v_moe, hunyuan_vl, jina_vlm, lfm2_vl, molmo2, paddleocr_vl, phi3_v, qwen3_vl_moe, and qwen3_omni_moe.

jrp2014 · 2026-01-23T23:12:58Z

The failure seems to be a black formatting issue

jrp2014 · 2026-01-23T23:21:19Z

MLX-VLM Compatibility Issues with `transformers 5.0.0rc3` - 5 Model Failures

Summary

Testing 38 vision-language models revealed 5 failures in mlx-vlm related to compatibility with transformers 5.0.0rc3. These failures prevent model loading for InternVL, Kimi-VL, Phi-3.5-vision, and Florence-2 models.

NB: mlx-lm now uses transformers 5.0.0rc3

Test Environment:

MLX version: 0.30.4.dev20260123+dc81c150
MLX-VLM version: 0.3.10 (pc/fix-gemma branch)
Transformers version: 5.0.0rc3
Tokenizers version: 0.22.2
Python version: 3.13.9
System: macOS 26.2 (Darwin 25.2.0), Apple M4 Max, 128GB RAM
Test date: 2026-01-23 21:39:59 GMT

Issue #1: InternVL Processor Type Incompatibility

Model: `mlx-community/InternVL3-14B-8bit`

Error Type: TypeError
Stage: Processor Loading
Location: mlx_vlm/models/internvl_chat/processor.py:288

Error Message:

Received a InternVLImageProcessor for argument image_processor, but a ImageProcessingMixin was expected.

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 396, in from_pretrained
    return processor_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/models/internvl_chat/processor.py", line 368, in from_pretrained
    return InternVLChatProcessor(
        image_processor=image_processor, tokenizer=tokenizer
    )
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/models/internvl_chat/processor.py", line 288, in __init__
    super().__init__(image_processor, tokenizer, chat_template=chat_template)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 614, in __init__
    self.check_argument_for_proper_class(attribute_name, arg)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 697, in check_argument_for_proper_class
    raise TypeError(
        f"Received a {type(argument).__name__} for argument {argument_name}, but a {class_name} was expected."
    )
TypeError: Received a InternVLImageProcessor for argument image_processor, but a ImageProcessingMixin was expected.

Root Cause: Transformers 5.0.0rc3 introduced stricter type checking in ProcessorMixin.__init__(). The InternVLChatProcessor is passing an InternVLImageProcessor instance, but transformers now requires it to inherit from ImageProcessingMixin.

HF Cache Info: 15600.9 MB, 18 files

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="mlx-community/InternVL3-14B-8bit",
    lazy=True,
    trust_remote_code=True
)

Issue #2: Missing Private Function Import

Models

mlx-community/Kimi-VL-A3B-Thinking-2506-bf16
mlx-community/Kimi-VL-A3B-Thinking-8bit

Error Type: ImportError
Stage: Processor Loading (dynamic module import)
Location: HuggingFace cached module processing_kimi_vl.py:25

Error Message:

cannot import name '_validate_images_text_input_order' from 'transformers.processing_utils'

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 387, in from_pretrained
    processor_class = get_class_from_dynamic_module(
        processor_auto_map, pretrained_model_name_or_path, **kwargs
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 583, in get_class_from_dynamic_module
    return get_class_in_module(class_name, final_module, force_reload=force_download)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 309, in get_class_in_module
    module_spec.loader.exec_module(module)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "<frozen importlib._bootstrap_external>", line 1027, in exec_module
  File "<frozen importlib._bootstrap>", line 488, in _call_with_frames_removed
  File "/Users/jrp/.cache/huggingface/modules/transformers_modules/a04b2b044b1795d3e56eeee0d4946ca0c3a9d0fc/processing_kimi_vl.py", line 25, in <module>
    from transformers.processing_utils import ProcessingKwargs, ProcessorMixin, Unpack, _validate_images_text_input_order
ImportError: cannot import name '_validate_images_text_input_order' from 'transformers.processing_utils'

Root Cause: The Kimi-VL processor code (stored in HuggingFace cache) imports a private function _validate_images_text_input_order that was removed or renamed in transformers 5.0.0rc3.

HF Cache Info:

Kimi-VL-A3B-Thinking-2506-bf16: 31298.6 MB, 23 files
Kimi-VL-A3B-Thinking-8bit: 17004.9 MB, 17 files

Note: This is a model repository issue, but mlx-vlm could add a compatibility shim or version check to provide better error messages.

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="mlx-community/Kimi-VL-A3B-Thinking-8bit",
    lazy=True,
    trust_remote_code=True
)

Issue #3: Missing Configuration File

Model: `mlx-community/Phi-3.5-vision-instruct-bf16`

Error Type: OSError
Stage: Processor Loading
Location: transformers/utils/hub.py:377

Error Message:

/Users/jrp/.cache/huggingface/hub/models--mlx-community--Phi-3.5-vision-instruct-bf16/snapshots/d8da684308c275a86659e2b36a9189b2f4aec8ea does not appear to have a file named image_processing_phi3_v.py.

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 392, in from_pretrained
    return processor_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1413, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1532, in _get_arguments_from_pretrained
    sub_processor = auto_processor_class.from_pretrained(
        pretrained_model_name_or_path, subfolder=subfolder, **kwargs
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/image_processing_auto.py", line 610, in from_pretrained
    image_processor_class = get_class_from_dynamic_module(class_ref, pretrained_model_name_or_path, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 572, in get_class_from_dynamic_module
    final_module = get_cached_module_file(
        repo_id,
        module_file,
        cache_dir=cache_dir,
        force_download=force_download,
        resume_download=resume_download,
        local_files_only=local_files_only,
        repo_type=repo_type,
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/dynamic_module_utils.py", line 390, in get_cached_module_file
    resolved_module_file = cached_file(
        pretrained_model_name_or_path,
        module_file,
        cache_dir=cache_dir,
        force_download=force_download,
        resume_download=resume_download,
        proxies=proxies,
        local_files_only=local_files_only,
        token=token,
        revision=revision,
        repo_type=repo_type,
        _commit_hash=_commit_hash,
    )
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/utils/hub.py", line 276, in cached_file
    file = cached_files(path_or_repo_id=path_or_repo_id, filenames=[filename], **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/utils/hub.py", line 377, in cached_files
    raise OSError(
        f"{path_or_repo_id} does not appear to have a file named {filename}. Checkout "
        f"'{hf_hub_url(path_or_repo_id, filename, repo_type=repo_type, revision=revision)}/tree/{revision}' for available files."
    )
OSError: /Users/jrp/.cache/huggingface/hub/models--mlx-community--Phi-3.5-vision-instruct-bf16/snapshots/d8da684308c275a86659e2b36a9189b2f4aec8ea does not appear to have a file named image_processing_phi3_v.py.

Root Cause: The model repository is missing the required image_processing_phi3_v.py file. This appears to be an incomplete model conversion or upload issue.

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="mlx-community/Phi-3.5-vision-instruct-bf16",
    lazy=True,
    trust_remote_code=True
)

Note: The 4-bit quantized version microsoft/Phi-3.5-vision-instruct works correctly, suggesting this is specific to the bf16 conversion.

Issue #4: Tokenizer Attribute Error

Model: `prince-canuma/Florence-2-large-ft`

Error Type: AttributeError
Stage: Processor Loading
Location: HuggingFace cached module processing_florence2.py:87

Error Message:

RobertaTokenizer has no attribute additional_special_tokens. Did you mean: 'add_special_tokens'?

Full Stack Trace:

Traceback (most recent call last):
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 326, in load
    processor = load_processor(model_path, True, eos_token_ids=eos_token_id, **kwargs)
  File "/Users/jrp/Documents/AI/mlx/mlx-vlm/mlx_vlm/utils.py", line 399, in load_processor
    processor = AutoProcessor.from_pretrained(model_path, use_fast=True, **kwargs)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/models/auto/processing_auto.py", line 392, in from_pretrained
    return processor_class.from_pretrained(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1414, in from_pretrained
    return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)
           ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/processing_utils.py", line 1182, in from_args_and_dict
    processor = cls(*args, **valid_kwargs)
  File "/Users/jrp/.cache/huggingface/modules/transformers_modules/microsoft/Florence_hyphen_2_hyphen_large_hyphen_ft/4a12a2b54b7016a48a22037fbd62da90cd566f2a/processing_florence2.py", line 87, in __init__
    tokenizer.additional_special_tokens + \
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx-vlm/lib/python3.13/site-packages/transformers/tokenization_utils_base.py", line 1326, in __getattr__
    raise AttributeError(f"{self.__class__.__name__} has no attribute {key}")
AttributeError: RobertaTokenizer has no attribute additional_special_tokens. Did you mean: 'add_special_tokens'?

Root Cause: The Florence-2 processor code (in HuggingFace cache) references tokenizer.additional_special_tokens which was removed or changed in transformers 5.0.0rc3. The error message suggests using add_special_tokens instead.

Reproduction:

from mlx_vlm.utils import load
model, tokenizer = load(
    path_or_hf_repo="prince-canuma/Florence-2-large-ft",
    lazy=True,
    trust_remote_code=True
)

Summary of Root Causes

All 5 failures are related to transformers 5.0.0rc3 breaking changes:

Stricter type checking - ImageProcessingMixin enforcement (InternVL)
Removed private APIs - _validate_images_text_input_order (Kimi-VL)
Missing config files - Model repository issue (Phi-3.5-bf16)
Attribute renaming - additional_special_tokens removed (Florence-2)

Recommendations

Immediate Actions

Add transformers version check in mlx_vlm/utils.py:

import transformers
from packaging import version

if version.parse(transformers.__version__) >= version.parse("5.0.0"):
    logger.warning("transformers 5.0+ detected. Some models may fail to load.")

Update InternVL processor (mlx_vlm/models/internvl_chat/processor.py):
- Ensure InternVLImageProcessor inherits from ImageProcessingMixin
- Or add compatibility layer for transformers 5.0+
Document compatibility in README:
- Specify tested transformers versions
- Add known incompatibilities section

Long-term Solutions

Pin transformers version in requirements.txt until compatibility is resolved:
```
transformers>=4.40.0,<5.0.0
```
Add CI testing for multiple transformers versions
Contact model authors about missing files (Phi-3.5-bf16) and outdated processor code (Kimi-VL, Florence-2)

Test Configuration

Command:

python -m check_models --trust-remote-code --verbose

Test image: 8640x5400 pixels (46.7 MPixels)
Parameters: max_tokens=500, temperature=0.1, timeout=300s

Success rate: 30/38 models (78.9%) completed successfully

Additional Context

Full logs available:

https://github.com/jrp2014/check_models/tree/main/src/output

src/output/check_models.log (272KB, 2157 lines)
src/output/results.md (87KB)
src/output/environment.log (16KB)

Working models include:

HuggingFaceTB/SmolVLM-Instruct
Qwen/Qwen3-VL-2B-Instruct
meta-llama/Llama-3.2-11B-Vision-Instruct
microsoft/Phi-3.5-vision-instruct (4-bit version)
And 26 others

Blaizzy added 15 commits January 20, 2026 22:24

Enhance Gemma3Model with sliding window configuration

79ff72a

add prefill and refactor get_input_embeddings output

d5c5be3

Merge branch 'main' of https://github.com/Blaizzy/mlx-vlm into pc/fix…

3bbd21b

…-gemma

fix molmo signiture

46f299c

Enhance argument parsing to include --trust-remote-code option for mo…

a443b75

…del loading. Update main function to utilize this new argument for trust_remote_code parameter.

fix gemma3n prefill

3da547a

Refactor input embedding handling across multiple models to return In…

a92d1cd

…putEmbeddingsFeatures. This change standardizes the output of the get_input_embeddings method, ensuring consistent access to inputs_embeds and attention_mask_4d across the Model classes in the mlx_vlm package.

Refactor input handling in glm4v and glm4v_moe models to utilize kwar…

65d041d

…gs for grid dimensions. Standardize return type of get_input_embeddings method to InputEmbeddingsFeatures, ensuring consistent access to inputs_embeds across both models.

Refactor get_input_embeddings method to improve input handling and re…

6869f93

…turn type consistency. Update __call__ method to utilize the new get_input_embeddings structure, ensuring standardized access to inputs_embeds across the Model and LanguageModel classes.

add inputEmbedding features

e772e6e

add tests

4f20a7e

Refactor input embedding handling across various models to utilize In…

87deb7a

…putEmbeddingsFeatures. This update standardizes the access to inputs_embeds and attention_mask_4d, ensuring consistency in the Model class implementations within the mlx_vlm package.

fix LFM2-VL

c05e5fb

Refactor input embedding return types across multiple models to utili…

0e43917

…ze InputEmbeddingsFeatures, ensuring consistent handling of inputs_embeds and attention_mask. This update enhances the standardization of the Model class implementations in the mlx_vlm package.

Add comprehensive tests for input embeddings across multiple models i…

e942f4e

…n mlx_vlm, including aya_vision, deepseek_vl_v2, deepseekocr, fastvlm, florence2, gemma3n, glm4v, glm4v_moe, hunyuan_vl, jina_vlm, lfm2_vl, molmo2, paddleocr_vl, phi3_v, qwen3_vl_moe, and qwen3_omni_moe.

This was referenced Jan 23, 2026

Adding full weight finetuning #499

Draft

Model Testing Report 2026-01-23 #682

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add text prefill and input embeddings obj #681

Add text prefill and input embeddings obj #681

Uh oh!

Blaizzy commented Jan 23, 2026

Uh oh!

jrp2014 commented Jan 23, 2026

Uh oh!

jrp2014 commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add text prefill and input embeddings obj #681

Are you sure you want to change the base?

Add text prefill and input embeddings obj #681

Uh oh!

Conversation

Blaizzy commented Jan 23, 2026

Uh oh!

jrp2014 commented Jan 23, 2026

Uh oh!

jrp2014 commented Jan 23, 2026

MLX-VLM Compatibility Issues with transformers 5.0.0rc3 - 5 Model Failures

Summary

Issue #1: InternVL Processor Type Incompatibility

Model: mlx-community/InternVL3-14B-8bit

Issue #2: Missing Private Function Import

Models

Issue #3: Missing Configuration File

Model: mlx-community/Phi-3.5-vision-instruct-bf16

Issue #4: Tokenizer Attribute Error

Model: prince-canuma/Florence-2-large-ft

Summary of Root Causes

Recommendations

Immediate Actions

Long-term Solutions

Test Configuration

Additional Context

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

MLX-VLM Compatibility Issues with `transformers 5.0.0rc3` - 5 Model Failures

Model: `mlx-community/InternVL3-14B-8bit`

Model: `mlx-community/Phi-3.5-vision-instruct-bf16`

Model: `prince-canuma/Florence-2-large-ft`