Proposal: Support for InternVLChatModel in llama.cpp #11768

manwallet · 2025-02-09T02:27:30Z

manwallet
Feb 9, 2025

Hello llama.cpp community,

I am proposing an enhancement to the llama.cpp project to add support for the InternVLChatModel architecture. During the model conversion process using the convert_hf_to_gguf.py script, I encountered an error indicating that this specific model architecture is not yet supported.

Background:

Model: InternVLChatModel
Error: The script currently returns an error stating the model is not supported during the conversion process.
Use Case: This model architecture is part of our ongoing research and has potential applications in advanced conversational AI systems.

Motivation:

Adding support for InternVLChatModel would greatly enhance the flexibility and applicability of llama.cpp in handling a wider range of state-of-the-art models.
It broadens the utility of the tool, making it more attractive for users and developers working with modern architectures in AI.

Request:

I would appreciate insights from the community on how we might achieve this support.
If there are ongoing efforts or ideas on how to extend the existing conversion capabilities to include InternVLChatModel, please share your thoughts.

Additional Information:

Any advice on potential implementation approaches or necessary modifications would be welcomed.
References or documentation regarding the model's internal structure could be beneficial for the discussion.

Thank you for considering this proposal. I look forward to collaborating with the community to explore the feasibility of this enhancement.

pondahai · 2025-02-18T02:56:30Z

pondahai
Feb 18, 2025

Hello llama.cpp team,

While I am not an expert in model quantization or llama.cpp internals, I've been looking into issue #11768 concerning the quantization of Llama-Breeze2-8B-Instruct. With the help of AI, I've tried to analyze the problem and suggest some possible troubleshooting steps. I hope this analysis can be of some assistance to the team in resolving this.

Problem Summary:

The issue at hand is the failure to quantize the Llama-Breeze2-8B-Instruct model (Hugging Face link: https://huggingface.co/MediaTek-Research/Llama-Breeze2-8B-Instruct) using llama.cpp, as discussed in issue #11768 (https://github.com/ggml-org/llama.cpp/discussions/11768).

It appears the model architecture is defined by InternVLChatModel, as evidenced in the modeling_internvl_chat.py file (https://huggingface.co/MediaTek-Research/Llama-Breeze2-8B-Instruct/blob/main/modeling_internvl_chat.py). The use of InternVLChatModel suggests that this model might not be based on a standard Llama architecture and could be the root cause of the quantization incompatibility with llama.cpp. Standard llama.cpp quantization tools might not be designed to handle this specific architecture.

Thank you for your time and effort in addressing this issue.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Support for InternVLChatModel in llama.cpp #11768

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Proposal: Support for InternVLChatModel in llama.cpp #11768

manwallet Feb 9, 2025

Replies: 1 comment

pondahai Feb 18, 2025

Problem Summary:

manwallet
Feb 9, 2025

pondahai
Feb 18, 2025