[misc] fix: Use `AutoModelForImageTextToText` instead of `AutoModelForVision2Seq` #2475

kaln27 · 2025-07-11T06:14:57Z

What does this PR do?

According to Deprecate AutoModelForVision2Seq huggingface/transformers#38900, the AutoModelForVision2Seq will be deprecated soon. So verl need to use AutoModelForImageTextToText instead.
Also I found some model doesn't have 'generation_config.json` file, but the code assume it has. It will cause an error. So I check whether it is existed first (Already fix by other PR) .
I also create a PR Fix errors when use verl to train GLM4.1v model huggingface/transformers#39199 in transformers to support GLM4.1v training in verl.
Add GLM4.1v flops counter function. (Already done bu other PR)

…oad it

gemini-code-assist

Code Review

This pull request updates the checkpoint saving logic to handle cases where generation_config.json is missing and adds support for glm4v in the FLOPs counter.

My main feedback is on the fsdp_checkpoint_manager.py file. The current implementation for checking the existence of generation_config.json introduces a potential race condition. I've suggested a more robust try-except pattern to prevent potential crashes.

The changes in flops_counter.py look good.

I noticed the PR title mentions replacing AutoModelForVision2Seq with AutoModelFotImageTextToText, but this change doesn't seem to be included in the diffs. You might want to double-check if that was intended for this PR.

verl/utils/checkpoint/fsdp_checkpoint_manager.py

CLAassistant · 2025-07-11T06:34:34Z

All committers have signed the CLA.

kaln27 · 2025-07-28T01:55:42Z

@eric-haibin-lin Hi eric. Could you review this PR? Thanks a lot.

Removed 'glm4v' from the list of supported models.

kaln27 and others added 2 commits July 11, 2025 13:55

Support glm4.1v model and check generation_config.json exist before l…

9afba57

…oad it

Merge branch 'volcengine:main' into main

3f79f68

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

verl/utils/checkpoint/fsdp_checkpoint_manager.py Outdated Show resolved Hide resolved

kaln27 added 2 commits July 11, 2025 14:20

Use AutoModelForImageTextToText instead of AutoModelForVision2Seq

cd3d306

Fix code format

56b2980

kaln27 marked this pull request as ready for review July 11, 2025 06:36

kaln27 changed the title ~~[misc] fix: Use AutoModelFotImageTextToText instead of AutoModelForVision2Seq~~ [misc] fix: Use AutoModelForImageTextToText instead of AutoModelForVision2Seq Jul 11, 2025

Merge branch 'main' into main

cb782fe

kaln27 added 4 commits September 3, 2025 16:05

Merge branch 'main' into main

eda4a02

Merge branch 'main' into main

4da5446

Remove 'glm4v' from supported models list

2147dc0

Removed 'glm4v' from the list of supported models.

Merge branch 'main' into main

f30112a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[misc] fix: Use `AutoModelForImageTextToText` instead of `AutoModelForVision2Seq` #2475

[misc] fix: Use `AutoModelForImageTextToText` instead of `AutoModelForVision2Seq` #2475

Uh oh!

kaln27 commented Jul 11, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

CLAassistant commented Jul 11, 2025 •

edited

Loading

Uh oh!

kaln27 commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[misc] fix: Use AutoModelForImageTextToText instead of AutoModelForVision2Seq #2475

Are you sure you want to change the base?

[misc] fix: Use AutoModelForImageTextToText instead of AutoModelForVision2Seq #2475

Uh oh!

Conversation

kaln27 commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

CLAassistant commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kaln27 commented Jul 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[misc] fix: Use `AutoModelForImageTextToText` instead of `AutoModelForVision2Seq` #2475

[misc] fix: Use `AutoModelForImageTextToText` instead of `AutoModelForVision2Seq` #2475

kaln27 commented Jul 11, 2025 •

edited

Loading

CLAassistant commented Jul 11, 2025 •

edited

Loading