Skip to content

support MiniCPM-o2.6 #37917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from
Open

support MiniCPM-o2.6 #37917

wants to merge 19 commits into from

Conversation

tc-mb
Copy link

@tc-mb tc-mb commented May 1, 2025

support Minicpm-o2.6

@tc-mb tc-mb changed the title support Minicpm-o2.6 support MiniCPM-o2.6 May 1, 2025
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! I noticed the model doesn't have a modular file yet. I recommend to use modular transformers to add the model, it will allow you to inherit from any similar model in transformers and you won't have to rewrite the whole class

Also it makes the review process easier and faster, since we see what are the main differences between MiniCPM and other existing model 😉

Excited to have the model integrated in the library 🚀

@tc-mb
Copy link
Author

tc-mb commented May 4, 2025

Hey! I noticed the model doesn't have a modular file yet. I recommend to use modular transformers to add the model, it will allow you to inherit from any similar model in transformers and you won't have to rewrite the whole class

Also it makes the review process easier and faster, since we see what are the main differences between MiniCPM and other existing model 😉

Excited to have the model integrated in the library 🚀

Ok, I haven't finished modifying this PR yet, I will improve these next.

@tc-mb tc-mb marked this pull request as ready for review May 15, 2025 06:36
添加了 `test_processing_minicpm_o.py` 和 `test_modeling_minicpm_o.py` 测试文件,用于验证 MiniCPM-o-2.6 模型的处理和建模功能。测试覆盖了视觉、音频和文本生成模块的基本功能。
@tc-mb
Copy link
Author

tc-mb commented May 20, 2025

@zucchini-nlp Can you continue to help review?

@zucchini-nlp
Copy link
Member

zucchini-nlp commented May 20, 2025

Oke, I will take a look tomorrow, cc @eustlb as well for review. This model has audio modality supports

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tc-mb !

Thanks a lot for your work on MiniCPM. I reviewed the PR and I think there are a few moments that need further cleaning up and standardization. I left comments below, lmk if you need further assistance

@@ -344,7 +344,7 @@
),
("mega", ("RobertaTokenizer", "RobertaTokenizerFast" if is_tokenizers_available() else None)),
("megatron-bert", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)),
("mgp-str", ("MgpstrTokenizer", None)),
("minicpm_o_2_6", ("Qwen2Tokenizer", "MiniCPM_o_2_6TokenizerFast" if is_tokenizers_available() else None)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks weird, we can't use Qwen2FastTokenizer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a little different. We added a few special tokens.
I'm not sure how to write it best in this case. I hope to continue to ask for advice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I have seen your suggestion on how to deal with it.

Comment on lines +18 to +24
class MiniCPM_o_2_6TokenizerFast(Qwen2TokenizerFast):
def __init__(self, **kwargs):
super().__init__(**kwargs)
# image
self.im_start = "<image>"
self.im_end = "</image>"
self.ref_start = "<ref>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add special tokens, we don't need a new tokenizer class. We can expand an existing tokenizer as follows

return query.unsqueeze(1).repeat(1, N, 1)


class MultiheadAttention(nn.MultiheadAttention):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to follow transformers style for Attention, currently it is copied from torch source code

@@ -0,0 +1,46 @@
from transformers import AutoModelForCausalLM, AutoModel, AutoTokenizer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need test cases for modeling and processing, for ex in qwen

@tc-mb
Copy link
Author

tc-mb commented May 29, 2025

Hey @tc-mb !

Thanks a lot for your work on MiniCPM. I reviewed the PR and I think there are a few moments that need further cleaning up and standardization. I left comments below, lmk if you need further assistance

Thank you very much for your review, we will modify and submit it as soon as possible.

This is the first time we merge our multimodal model into transformers, and the omni model is a bit complicated. Now we see that there are many shortcomings, and we will persist in fixing them.

@eustlb
Copy link
Contributor

eustlb commented Jul 7, 2025

Hey @tc-mb, thanks so much for iterating on this!
@zucchini-nlp, happy to take over this PR if you’re happy with what’s been done up to this point 🤗

@zucchini-nlp
Copy link
Member

happy to take over this PR if you’re happy with what’s been done up to this point

Sure, very much needed! I didn't really review the audio related modules and looked at the vision part only. I'll take one last look at VLM part to see if it follows the new standard format, we've been changing a lot lately. Otherwise, feel free to take over 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants