Skip to content

Conversation

@sahil-kabir
Copy link
Contributor

@sahil-kabir sahil-kabir commented Sep 1, 2025

Main contributor: @sahil-kabir
Guided by: @PavloFesenko

What does this PR do?

Fixes #39549 Is there plan to integrate ColQwen2.5 into Transformers?

A PR was made on this but was closed because colqwen2.5 could be integrated without making a new class for it. This PR allows weights for colqwen2.5 to be imported using the existing colqwen2 class.

Edit:

Looks like passing in weights for colqwen2.5 and the config for Qwen2.5 works on its own. All I'm adding now is a integration test and documentation.

I made an endpoint with colqwen2.5 weights by merging the adpater weights from vidore/colqwen2.5-v0.2 to Qwen/Qwen2.5-VL-3B-Instruct (As per the documentation on the model card for vidore/colqwen2.5-v0.2) and shooting those weights up to Sahil-Kabir/colqwen2.5-v0.2-hf. And moved some extra config files from vidore/colqwen2.5-v0.2 over toSahil-Kabir/colqwen2.5-v0.2.

from peft import PeftModel
import torch

qwen_backbone = "Qwen/Qwen2.5-VL-3B-Instruct"

base_model = AutoModel.from_pretrained(
    qwen_backbone,
    device_map="cpu",
    dtype=torch.bfloat16,
    trust_remote_code=True,
)
peft_model = PeftModel.from_pretrained(base_model, "vidore/colqwen2.5-v0.2")
merged = peft_model.merge_and_unload()
merged.save_pretrained("merged_model")

I've added an integration test to use my endpoint and show that it works on the basic tests. The test I wrote is passing.

Before submitting

Who can review?

Hey @yonigozlan, with minimal code changes, it looks like colqwen2.5 can be imported using the colqwen2 class. By running, !python transformers/src/transformers/models/colqwen2/convert_colqwen2_weights_to_hf.py --model_id cjkasbdkjnlakb/colqwen2.5-v0.2-merged --original_vlm_name_or_path Qwen/Qwen2.5-VL-3B-Instruct --output_dir ./colqwen2_5 --using_qwen2_5 I'm able to get a model that seems to have similar outputs to the 2.5 in the colpali engine repo. @PavloFesenko and I are working on figuring out where one should import the colqwen2.5 adapter weights from, tests, and documentation for now.

Copy link
Contributor

@PavloFesenko PavloFesenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! 👍️ Please see my remarks below.

CI/CD tests are breaking because double quotes instead of single quotes are used inside f-strings with double quotes. But that error will disappear because we should avoid using f-strings inside loggers in general.

@Rocketknight1
Copy link
Member

Seems like document retrieval / VLM so cc @zucchini-nlp and maybe @tomaarsen ?

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sahil-kabir, thanks for contributing! Very nice if we can support colqwen2.5 without needing to add a whole new model 🤗.
Could you add a mention of colqwen2.5 support in the documentation as well? In transformers/docs/source/en/model_doc/colqwen2.md.
It would be great to add an integration test for colqwen2.5 like we have for colqwen2 in transformers/tests/models/colqwen2/test_modeling_colqwen2.py.

Comment on lines 371 to 372
dtype, device = self._get_dtype_device()
pixel_values = pixel_values.to(dtype=dtype, device=device)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big fan of this, it'd be great if we can avoid having use_qwen2_5 in the config. Let's just use the dtype and device of input_embeds

Suggested change
dtype, device = self._get_dtype_device()
pixel_values = pixel_values.to(dtype=dtype, device=device)
pixel_values = pixel_values.to(inputs_embeds.device, inputs_embeds.dtype)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, in qwen-2 vision tower, we cast pixels to correct dtype manually so it is not needed. Also, LM and vision might be loaded with different dtypes and devices in specific cases :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zucchini-nlp Did I understand correctly that this line could be removed entirely and it should work anyway?

@sahil-kabir Maybe worth a quick try. 😉

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, correct

Comment on lines 413 to 420
def _get_dtype_device(self) -> tuple[str, str]:
if self.config.use_qwen2_5:
parameters = next(self.vlm.visual.parameters())
else:
parameters = next(self.parameters())
dtype, device = parameters.dtype, parameters.device
return dtype, device

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for that then

Suggested change
def _get_dtype_device(self) -> tuple[str, str]:
if self.config.use_qwen2_5:
parameters = next(self.vlm.visual.parameters())
else:
parameters = next(self.parameters())
dtype, device = parameters.dtype, parameters.device
return dtype, device

config = ColQwen2Config(
vlm_config=original_config,
embedding_dim=128, # hardcoded in the original model
use_qwen2_5=use_qwen2_5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instantiate a qwen2_5 config instead

@sahil-kabir
Copy link
Contributor Author

Hey @yonigozlan, integration tests and documentation are updated. I have details on what I did to make this work in the PR description above. Going to undraft this now 👍

@sahil-kabir sahil-kabir marked this pull request as ready for review September 13, 2025 02:17
Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!
Very last thing to do would be to add an integration test directly against the original implementation. Other than that, LGTM!
@tonywu71 Do you think we could add the converted checkpoint to the vidore org and here?
Now waiting on @ArthurZucker or @Cyrilvallez final approval to get this merged!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@PavloFesenko
Copy link
Contributor

@yonigozlan Should we also add the code and clear instructions for merging adapter weights either to models/colqwen2/convert_colqwen2_weights_to_hf.py or to the model card? I feel that it could save a lot of time in case a new version of ColQwen2.5 is released by the authors since they typically release only the adapter weights which need to be merged to the base model in order to be used by HuggingFace?

@Cyrilvallez
Copy link
Member

run-slow: colqwen2

@github-actions
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/colqwen2']
quantizations: [] ...

Copy link
Member

@Cyrilvallez Cyrilvallez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks a lot, no diff to the original model 🤗
@yonigozlan, feel free to merge once everyone is happy and the tests are green! 🤗

@yonigozlan
Copy link
Member

yonigozlan commented Sep 17, 2025

@yonigozlan Should we also add the code and clear instructions for merging adapter weights either to models/colqwen2/convert_colqwen2_weights_to_hf.py or to the model card? I feel that it could save a lot of time in case a new version of ColQwen2.5 is released by the authors since they typically release only the adapter weights which need to be merged to the base model in order to be used by HuggingFace?

Yes absolutely! Users should be able to convert the original colqwen2.5 weights to the Transformers weights using convert_colqwen2_weights_to_hf.py. We can maybe add an arg to argparse to specify which model we're converting, or deduce it from the original weights/config if possible.

@tonywu71
Copy link
Contributor

Nice! Very last thing to do would be to add an integration test directly against the original implementation. Other than that, LGTM! @tonywu71 Do you think we could add the converted checkpoint to the vidore org and here? Now waiting on @ArthurZucker or @Cyrilvallez final approval to get this merged!

@antonioloison and @QuentinJGMace have the admin rights on the repository and are carrying the initial work on ColPali, can you check with them please? 🙏🏼

@sahil-kabir
Copy link
Contributor Author

sahil-kabir commented Sep 27, 2025

@yonigozlan confirmed the endpoint matches the original implementation and tests are added. I don't have access to a GPU with CUDA support (I have cpu and mps) so I set device_map="cpu" in the test. Let me know if you need something different.

@yonigozlan
Copy link
Member

Hey @sahil-kabir ! Sorry for the delay, I just updated the integration tests with the values I get when running on our CI hardware.
Everything looks good to me, only thing missing is this now:

Yes absolutely! Users should be able to convert the original colqwen2.5 weights to the Transformers weights using convert_colqwen2_weights_to_hf.py. We can maybe add an arg to argparse to specify which model we're converting, or deduce it from the original weights/config if possible.

@yonigozlan
Copy link
Member

@bot /style

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

Style bot fixed some files and pushed the changes.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: colqwen2

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the conversion script @sahil-kabir ! Waiting for the CI to pass and I'll merge :)

@yonigozlan yonigozlan merged commit cd30961 into huggingface:main Nov 3, 2025
17 checks passed
yonigozlan added a commit to yonigozlan/transformers that referenced this pull request Nov 7, 2025
* adding option for 2.5

* minor - arg in conversion script

* getting started on modelling.py

* minor - shouldve been using modular

* adressing comments + fixing datatype/device _get method

* minor

* commiting suggestion

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* docs + first test

* ruff fix

* minor fix

* ruff fix

* model fix

* model fix

* fine-grained check, with a hardcoded score from the original Hf implementation.

* minor ruff

* update tests values with CI hardware

* adding 2.5 to conversion script

* Apply style fixes

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Abdennacer-Badaoui pushed a commit to Abdennacer-Badaoui/transformers that referenced this pull request Nov 10, 2025
* adding option for 2.5

* minor - arg in conversion script

* getting started on modelling.py

* minor - shouldve been using modular

* adressing comments + fixing datatype/device _get method

* minor

* commiting suggestion

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* docs + first test

* ruff fix

* minor fix

* ruff fix

* model fix

* model fix

* fine-grained check, with a hardcoded score from the original Hf implementation.

* minor ruff

* update tests values with CI hardware

* adding 2.5 to conversion script

* Apply style fixes

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Is there plan to integrate ColQwen2.5 into Transformers?

8 participants