support MiniCPM-o2.6 #37917

tc-mb · 2025-05-01T15:53:28Z

support Minicpm-o2.6

sync main

zucchini-nlp

Hey! I noticed the model doesn't have a modular file yet. I recommend to use modular transformers to add the model, it will allow you to inherit from any similar model in transformers and you won't have to rewrite the whole class

Also it makes the review process easier and faster, since we see what are the main differences between MiniCPM and other existing model 😉

Excited to have the model integrated in the library 🚀

tc-mb · 2025-05-04T11:50:57Z

Hey! I noticed the model doesn't have a modular file yet. I recommend to use modular transformers to add the model, it will allow you to inherit from any similar model in transformers and you won't have to rewrite the whole class

Also it makes the review process easier and faster, since we see what are the main differences between MiniCPM and other existing model 😉

Excited to have the model integrated in the library 🚀

Ok, I haven't finished modifying this PR yet, I will improve these next.

添加了 `test_processing_minicpm_o.py` 和 `test_modeling_minicpm_o.py` 测试文件，用于验证 MiniCPM-o-2.6 模型的处理和建模功能。测试覆盖了视觉、音频和文本生成模块的基本功能。

tc-mb · 2025-05-20T09:28:47Z

@zucchini-nlp Can you continue to help review?

zucchini-nlp · 2025-05-20T09:29:51Z

Oke, I will take a look tomorrow, cc @eustlb as well for review. This model has audio modality supports

test: add test files for MiniCPM-o 2.6

zucchini-nlp

Hey @tc-mb !

Thanks a lot for your work on MiniCPM. I reviewed the PR and I think there are a few moments that need further cleaning up and standardization. I left comments below, lmk if you need further assistance

src/transformers/models/auto/tokenization_auto.py

src/transformers/models/auto/modeling_auto.py

zucchini-nlp · 2025-05-21T11:22:05Z

src/transformers/models/auto/tokenization_auto.py

@@ -344,7 +344,7 @@
            ),
            ("mega", ("RobertaTokenizer", "RobertaTokenizerFast" if is_tokenizers_available() else None)),
            ("megatron-bert", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
-            ("mgp-str", ("MgpstrTokenizer", None)),
+            ("minicpm_o_2_6", ("Qwen2Tokenizer", "MiniCPM_o_2_6TokenizerFast" if is_tokenizers_available() else None)),


looks weird, we can't use Qwen2FastTokenizer?

It's a little different. We added a few special tokens.
I'm not sure how to write it best in this case. I hope to continue to ask for advice.

Sorry, I have seen your suggestion on how to deal with it.

zucchini-nlp · 2025-05-23T13:25:01Z

src/transformers/models/minicpm_o_2_6/tokenization_minicpm_o_2_6_fast.py

+class MiniCPM_o_2_6TokenizerFast(Qwen2TokenizerFast):
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)
+        # image
+        self.im_start = "<image>"
+        self.im_end = "</image>"
+        self.ref_start = "<ref>"


to add special tokens, we don't need a new tokenizer class. We can expand an existing tokenizer as follows

src/transformers/models/minicpm_o_2_6/processing_minicpm_o_2_6.py

zucchini-nlp · 2025-05-26T15:38:21Z

src/transformers/models/minicpm_o_2_6/modular_minicpm_o_2_6.py

+        return query.unsqueeze(1).repeat(1, N, 1)
+
+
+class MultiheadAttention(nn.MultiheadAttention):


needs to follow transformers style for Attention, currently it is copied from torch source code

src/transformers/models/minicpm_o_2_6/modular_minicpm_o_2_6.py

zucchini-nlp · 2025-05-26T15:41:39Z

tests/models/minicpm_o_2_6/test.py

@@ -0,0 +1,46 @@
+from transformers import AutoModelForCausalLM, AutoModel, AutoTokenizer


we need test cases for modeling and processing, for ex in qwen

tc-mb · 2025-05-29T06:54:24Z

Hey @tc-mb !

Thanks a lot for your work on MiniCPM. I reviewed the PR and I think there are a few moments that need further cleaning up and standardization. I left comments below, lmk if you need further assistance

Thank you very much for your review, we will modify and submit it as soon as possible.

This is the first time we merge our multimodal model into transformers, and the omni model is a bit complicated. Now we see that there are many shortcomings, and we will persist in fixing them.

…ttention

…essor

…convert_omni_to_inputs()

…iniCPMVImageProcessor to image_processing_minicpm; remove token parameters in MiniCPMVImageProcessor

Fit commits 0

eustlb · 2025-07-07T12:20:04Z

Hey @tc-mb, thanks so much for iterating on this!
@zucchini-nlp, happy to take over this PR if you’re happy with what’s been done up to this point 🤗

zucchini-nlp · 2025-07-08T06:52:01Z

happy to take over this PR if you’re happy with what’s been done up to this point

Sure, very much needed! I didn't really review the audio related modules and looked at the vision part only. I'll take one last look at VLM part to see if it follows the new standard format, we've been changing a lot lately. Otherwise, feel free to take over 🤗

tc-mb and others added 3 commits April 17, 2025 14:49

inti

f1a37a8

Merge pull request #1 from tc-mb/main

996b0b8

sync main

Merge branch 'huggingface:main' into minicpm_o_2_6

ec19d70

tc-mb changed the title ~~support Minicpm-o2.6~~ support MiniCPM-o2.6 May 1, 2025

zucchini-nlp reviewed May 2, 2025

View reviewed changes

tc-mb and others added 4 commits May 6, 2025 11:49

clean code

e14fd4e

add modular

a54ff55

add readme

a7d7bba

Merge branch 'main' into minicpm_o_2_6

1f8677f

tc-mb marked this pull request as ready for review May 15, 2025 06:36

test: 新增 MiniCPM-o-2.6 模型的测试文件

fee6f14

添加了 `test_processing_minicpm_o.py` 和 `test_modeling_minicpm_o.py` 测试文件，用于验证 MiniCPM-o-2.6 模型的处理和建模功能。测试覆盖了视觉、音频和文本生成模块的基本功能。

zucchini-nlp requested review from eustlb and zucchini-nlp May 20, 2025 09:30

ZMXJJ and others added 3 commits May 24, 2025 20:41

add test_processing_minicpm_o_2_6.py

d7ce0ee

refactor: restructure test files for MiniCPM-o-2.6

72a12de

Merge pull request #2 from ZMXJJ/minicpm_o_2_6

44a0b2c

test: add test files for MiniCPM-o 2.6

zucchini-nlp reviewed May 28, 2025

View reviewed changes

fix code step1

ad61082

zucchini-nlp mentioned this pull request Jun 2, 2025

Adding a stub for MiniCPM-o to the models #37049

Open

1 task

tc-mb and others added 7 commits June 3, 2025 11:52

fix code step2

ae12902

Merge branch 'main' into minicpm_o_2_6

49391cd

change MiniCPMOTokenizerFast to Qwen2TokenizerFast & fit new WhisperA…

afae5f7

…ttention

modify init() and processing API in call() in class MiniCPM_o_2_6Proc…

fd6f9ab

…essor

instead of pad(), call tokenizer to create attn mask and padding in _…

bceee7d

…convert_omni_to_inputs()

now use MiniCPM_o_2_6Tokenizer and MiniCPM_o_2_6TokenizerFast; move M…

07db6af

…iniCPMVImageProcessor to image_processing_minicpm; remove token parameters in MiniCPMVImageProcessor

Merge pull request #3 from tc-mb/fit_commits_0

d98d2ad

Fit commits 0

ArthurZucker added the New model label Jul 7, 2025

		return query.unsqueeze(1).repeat(1, N, 1)


		class MultiheadAttention(nn.MultiheadAttention):

		@@ -0,0 +1,46 @@
		from transformers import AutoModelForCausalLM, AutoModel, AutoTokenizer

support MiniCPM-o2.6 #37917

Are you sure you want to change the base?

support MiniCPM-o2.6 #37917

Uh oh!

Conversation

tc-mb commented May 1, 2025

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

tc-mb commented May 4, 2025

Uh oh!

tc-mb commented May 20, 2025

Uh oh!

zucchini-nlp commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp May 21, 2025

Choose a reason for hiding this comment

Uh oh!

tc-mb May 29, 2025

Choose a reason for hiding this comment

Uh oh!

tc-mb May 29, 2025

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp May 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zucchini-nlp May 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zucchini-nlp May 26, 2025

Choose a reason for hiding this comment

Uh oh!

tc-mb commented May 29, 2025

Uh oh!

eustlb commented Jul 7, 2025

Uh oh!

zucchini-nlp commented Jul 8, 2025

Uh oh!

Uh oh!

zucchini-nlp commented May 20, 2025 •

edited

Loading

zucchini-nlp left a comment •

edited

Loading