-
Notifications
You must be signed in to change notification settings - Fork 30k
support MiniCPM-o2.6 #37917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
support MiniCPM-o2.6 #37917
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! I noticed the model doesn't have a modular file yet. I recommend to use modular transformers to add the model, it will allow you to inherit from any similar model in transformers and you won't have to rewrite the whole class
Also it makes the review process easier and faster, since we see what are the main differences between MiniCPM and other existing model 😉
Excited to have the model integrated in the library 🚀
Ok, I haven't finished modifying this PR yet, I will improve these next. |
添加了 `test_processing_minicpm_o.py` 和 `test_modeling_minicpm_o.py` 测试文件,用于验证 MiniCPM-o-2.6 模型的处理和建模功能。测试覆盖了视觉、音频和文本生成模块的基本功能。
@zucchini-nlp Can you continue to help review? |
Oke, I will take a look tomorrow, cc @eustlb as well for review. This model has audio modality supports |
test: add test files for MiniCPM-o 2.6
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @tc-mb !
Thanks a lot for your work on MiniCPM. I reviewed the PR and I think there are a few moments that need further cleaning up and standardization. I left comments below, lmk if you need further assistance
@@ -344,7 +344,7 @@ | |||
), | |||
("mega", ("RobertaTokenizer", "RobertaTokenizerFast" if is_tokenizers_available() else None)), | |||
("megatron-bert", ("BertTokenizer", "BertTokenizerFast" if is_tokenizers_available() else None)), | |||
("mgp-str", ("MgpstrTokenizer", None)), | |||
("minicpm_o_2_6", ("Qwen2Tokenizer", "MiniCPM_o_2_6TokenizerFast" if is_tokenizers_available() else None)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks weird, we can't use Qwen2FastTokenizer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little different. We added a few special tokens.
I'm not sure how to write it best in this case. I hope to continue to ask for advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I have seen your suggestion on how to deal with it.
class MiniCPM_o_2_6TokenizerFast(Qwen2TokenizerFast): | ||
def __init__(self, **kwargs): | ||
super().__init__(**kwargs) | ||
# image | ||
self.im_start = "<image>" | ||
self.im_end = "</image>" | ||
self.ref_start = "<ref>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to add special tokens, we don't need a new tokenizer class. We can expand an existing tokenizer as follows
src/transformers/models/minicpm_o_2_6/processing_minicpm_o_2_6.py
Outdated
Show resolved
Hide resolved
return query.unsqueeze(1).repeat(1, N, 1) | ||
|
||
|
||
class MultiheadAttention(nn.MultiheadAttention): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to follow transformers style for Attention
, currently it is copied from torch source code
@@ -0,0 +1,46 @@ | |||
from transformers import AutoModelForCausalLM, AutoModel, AutoTokenizer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need test cases for modeling and processing, for ex in qwen
Thank you very much for your review, we will modify and submit it as soon as possible. This is the first time we merge our multimodal model into transformers, and the omni model is a bit complicated. Now we see that there are many shortcomings, and we will persist in fixing them. |
…convert_omni_to_inputs()
…iniCPMVImageProcessor to image_processing_minicpm; remove token parameters in MiniCPMVImageProcessor
Fit commits 0
Hey @tc-mb, thanks so much for iterating on this! |
Sure, very much needed! I didn't really review the audio related modules and looked at the vision part only. I'll take one last look at VLM part to see if it follows the new standard format, we've been changing a lot lately. Otherwise, feel free to take over 🤗 |
support Minicpm-o2.6