Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6


模型：https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6

通常，多模态大模型微调会使用自定义数据集进行微调。在这里，我们将展示可直接运行的demo。

在开始微调之前，请确保您的环境已准备妥当。
```bash
git clone https://github.com/modelscope/swift.git
cd swift
pip install -e .[llm]
```
### 模型推理
```bash
CUDA_VISIBLE_DEVICES=0 swift infer \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6
```

```
<<< 你好
你好！今天我能为您提供什么帮助？
--------------------------------------------------
<<< clear
<<< <image>描述这张图片
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
这张图片展示了一只小猫的特写，它有着引人注目的外貌。小猫有着大大的、圆圆的、蓝色的眼睛，看起来充满了好奇和天真。它的毛色主要是白色，带有灰色和黑色的条纹，特别是在脸部和耳朵周围，这些地方的条纹更加明显。小猫的耳朵竖立着，尖尖的，内侧是粉红色的。它的胡须又长又白，从脸颊上伸出来。小猫的鼻子是粉红色的，嘴巴微微张开，露出一点粉红色的舌头。背景模糊，将焦点集中在小猫身上，暗示着一个室内环境，柔和的光线照亮了小猫的毛发。
--------------------------------------------------
<<< clear
<<< <video>描述这段视频
Input a video path or URL <<< https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4
这段视频展示了一个年幼的孩子坐在床上，专心阅读一本书。孩子戴着深色眼镜，穿着浅蓝色无袖上衣和粉色裤子。床上铺着白色床单，孩子旁边放着一件白色衣物。背景中有一个木制婴儿床，暗示着一个家庭环境。房间光线柔和，氛围平静。视频中没有明显的动作或活动，孩子似乎完全沉浸在阅读中。
```


### 图片微调

我们使用 coco-en-mini 数据集进行微调，该数据集的任务是对图片内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary

```bash
# 默认会将lora_target_modules设置为llm和resampler所有的linear
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset coco-en-mini#20000 \
  --deepspeed default-zero2
```

如果要使用自定义数据集，只需按以下方式进行指定：
```bash
  --dataset train.jsonl \
  --val_dataset val.jsonl \
```

自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：
```
{"query": "<image>55555", "response": "66666", "images": ["image_path"]}
{"query": "eeeee<image>eeeee<image>eeeee", "response": "fffff", "history": [], "images": ["image_path1", "image_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "images": []}
```
显存占用：
![image](https://github.com/user-attachments/assets/99fab9b0-9655-4154-8ee7-59196fdd434b)

微调后推理脚本如下：
```bash
# 如果要全量测试请设置: `--show_dataset_sample -1`
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true
```
微调后模型对验证集进行推理的示例（时间原因，只训练了300个step）：

![image](https://github.com/user-attachments/assets/2f6f0f91-caeb-4a5b-a1cb-da21543d8fcc)

### 视频微调
我们使用 video-chatgpt 数据集进行微调，该数据集的任务是对视频内容进行描述。您可以在 modelscope 上找到该数据集：https://modelscope.cn/datasets/swift/VideoChatGPT
```bash
CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \
  --model_type minicpm-v-v2_6-chat \
  --model_id_or_path OpenBMB/MiniCPM-V-2_6 \
  --sft_type lora \
  --dataset video-chatgpt \
  --deepspeed default-zero2
```
自定义数据集支持json和jsonl样式，以下是自定义数据集的样例：
```bash
{"query": "<video>55555", "response": "66666", "videos": ["video_path"]}
{"query": "eeeee<video>eeeee<video>eeeee", "response": "fffff", "history": [], "videos": ["video_path1", "video_path2"]}
{"query": "EEEEE", "response": "FFFFF", "history": [["query1", "response2"], ["query2", "response2"]], "videos": []}
```
显存占用：
![image](https://github.com/user-attachments/assets/fa51af50-bbb5-46a9-bbba-66efc61b3adf)

微调后推理脚本如下：
```bash
CUDA_VISIBLE_DEVICES=0 swift infer \
    --ckpt_dir output/minicpm-v-v2_6-chat/vx-xxx/checkpoint-xxx \
    --load_dataset_config true --merge_lora true
```
微调后模型对验证集进行推理的示例（时间原因，只训练了50个step）：
![image](https://github.com/user-attachments/assets/62a71c34-8d29-4c6f-bf5d-b79f63e8d889)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6 #1613

模型推理

图片微调

视频微调

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6 #1613

Description

模型推理

图片微调

视频微调

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions