-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best Practices for Inference and Fine-Tuning with MiniCPM-V 2.6 #1613
Comments
官方文档的多图理解和in-context有在swift api里支持吗? |
需要升级swift到什么版本啊? |
还在main分支 |
单样本视频推理的代码可以提供吗 |
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_model_tokenizer, get_template, inference, ModelType,
get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch
model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
model, tokenizer = get_model_tokenizer(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = '<video>描述这段视频'
videos = ['https://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/baby.mp4']
response, history = inference(model, template, query, videos=videos)
print(f'query: {query}')
print(f'response: {response}')
# 流式(streaming)
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
gen = inference_stream(model, template, query, images=images)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
"""
query: <video>描述这段视频
response: 这段视频展示了一个年幼的孩子,可能是一个蹒跚学步的幼儿,坐在床上专心阅读一本书。孩子戴着深色眼镜,穿着浅绿色无袖上衣和粉色裤子。床上铺着白色床单,背景中有一个木制婴儿床,暗示着一个家庭环境。房间光线充足,氛围温馨舒适。孩子专注的表情和姿势表明他们对书本内容很投入。
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写,它有着引人注目的面部特征。小猫的毛色主要是白色,带有灰色和黑色的条纹,特别是在眼睛周围和耳朵上。它的眼睛又大又圆,有着蓝色的虹膜,看起来非常好奇或专注。小猫的耳朵竖立着,内耳是粉红色的,与毛色形成对比。小猫的鼻子是粉红色的,有着小小的黑色鼻子,嘴巴微微张开,露出一点粉红色的舌头。小猫的胡须又长又白,从脸颊上伸出来。背景模糊,将焦点集中在小猫身上,暗示着一个室内环境,有自然光线,可能来自窗户。
""" |
请问官方的few-shot推理方式 swift有支持么? |
is this included in documentation somewhere... |
Thank you for the excellent suggestions. We will update the document within this week. |
使用vllm: pip install vllm>=0.5.4
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import (
get_vllm_engine, get_template, inference_vllm, ModelType,
get_default_template_type, inference_stream_vllm
)
from swift.utils import seed_everything
import torch
model_type = ModelType.minicpm_v_v2_6_chat
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')
vllm_engine = get_vllm_engine(model_type, torch.bfloat16, model_id_or_path=model_id_or_path,
max_model_len=8192)
tokenizer = vllm_engine.hf_tokenizer
vllm_engine.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)
query = '<image>描述这张图片'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
generation_info = {}
request_list = [{'query': query, 'images': images} for _ in range(100)] # batch推理的示例
resp_list = inference_vllm(vllm_engine, template, request_list, generation_info=generation_info, use_tqdm=True)
print(f'query: {query}')
print(f'response: {resp_list[0]["response"]}')
print(generation_info)
# 流式(streaming)
generation_info = {}
gen = inference_stream_vllm(vllm_engine, template, request_list, generation_info=generation_info)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
# only show first
for resp_list in gen:
resp = resp_list[0]
if resp is None:
continue
response = resp['response']
delta = response[print_idx:]
print(delta, end='', flush=True)
print_idx = len(response)
print()
print(generation_info)
"""
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:01<00:00, 91.47it/s]
100%|██████████████████████████████████████████████████████████████████████████████| 100/100 [00:22<00:00, 4.48it/s]
query: <image>描述这张图片
response: 这张图片展示了一只小猫咪的特写,可能是美国短毛猫品种,因为其花纹和毛发质地。猫咪有着引人注目的蓝色眼睛,这是其外貌中非常突出的特征。它皮毛上有着独特的黑色条纹,从面颊延伸至头顶,暗示着一种有条纹的花纹图案。它的耳朵小而尖,内侧是粉色的。猫咪的胡须细长而突出,围绕在它的下颌两侧和眼睛周围。猫咪坐着,用一种表达丰富的方式直视着,嘴巴微微张开,露出粉红色的内唇。背景模糊,柔和的光线增强了猫咪的特征。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14734, 'num_samples': 100, 'runtime': 23.53027338697575, 'samples/s': 4.249844375176322, 'tokens/s': 626.1720702384794}
query: <image>描述这张图片
response: 这张图片展示了一只小猫的特写,可能是一只幼年猫,在模糊的背景中,集中注意力在猫的表情上。这只猫长着一身白色与黑色条纹相间的毛皮,带有微妙的灰褐色。它的眼睛大而圆,具有高度的反光度,表明它们可能含有异色瞳,即一只眼睛是蓝色的,另一只是绿色的,但这只猫两只眼睛都是绿色的。睫毛清晰可见,增添了一种生动的表情。猫的耳朵竖立着,内部呈粉红色,边缘有浅色的阴影,显示出柔软的毛发。胡须又长又明显,突显了小猫的脸部形状。这个品种的猫看起来是一个常见品种,毛皮图案和眼睛颜色表明它可能是一只虎斑猫。光线柔和,产生一种天鹅绒般的效果,突出了猫绒毛的质感。
{'num_prompt_tokens': 2700, 'num_generated_tokens': 14986, 'num_samples': 100, 'runtime': 23.375922130944673, 'samples/s': 4.277906105257837, 'tokens/s': 641.0870089339394}
""" |
微调minicpm-v-v2_6-chat出现报错: 微调其他模型是可以的,微调命令如下: CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft |
请教一下,可以提供一下 Async+VLLM 推理 minicpmv2-6的代码么。 |
swift deploy 走的是 Async+VLLM的 客户端调用方式可以查看这里的文档: |
|
这个文档显示的是 openai的客户端调用方法,openai 是同步调用吧? 异步调用代码是不是得用 asyncio 包吧? |
服务端: CUDA_VISIBLE_DEVICES=0 swift deploy --model_type minicpm-v-v2_6-chat --infer_backend vllm --max_model_len 8192 客户端: import asyncio
from swift.llm import get_model_list_client, XRequestConfig, inference_client_async
model_list = get_model_list_client()
model_type = model_list.data[0].id
print(f'model_type: {model_type}')
request_config = XRequestConfig(seed=42)
query = '<image>Describe this image.'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png']
tasks = [inference_client_async(model_type, query, request_config=request_config) for _ in range(100)]
async def _batch_run(tasks):
return await asyncio.gather(*tasks)
resp_list = asyncio.run(_batch_run(tasks))
print(f'query: {query}')
print(f'response0: {resp_list[0].choices[0].message.content}')
print(f'response1: {resp_list[1].choices[0].message.content}')
query = '<image>How many sheep are in the picture?'
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/animal.png']
async def _stream():
global query
request_config = XRequestConfig(seed=42, stream=True)
stream_resp = await inference_client_async(model_type, query, images=images, request_config=request_config)
print(f'query: {query}')
print('response: ', end='')
async for chunk in stream_resp:
print(chunk.choices[0].delta.content, end='', flush=True)
print()
asyncio.run(_stream())
"""
query: <image>Describe this image.
response0: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.
In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.
The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.
Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
response1: The video showcases a serene and picturesque landscape. The scene is dominated by a vast expanse of lush greenery, with a dense forest stretching out into the distance. The trees, varying in shades of green, create a vibrant tapestry that fills the frame. The forest appears to be thriving, with the sunlight filtering through the leaves and casting dappled shadows on the forest floor.
In the foreground, a small clearing is visible, providing a glimpse of the open sky above. The sky is a clear blue, with a few wispy clouds scattered across it, adding depth to the scene. The overall atmosphere of the video is tranquil and peaceful, with the natural beauty of the landscape taking center stage.
The video is likely shot during the day, as the lighting is bright and natural. The camera angle is slightly elevated, offering a panoramic view of the forest and the surrounding area. The focus is sharp, allowing for the intricate details of the trees and the forest floor to be clearly visible.
Overall, the video captures the essence of a peaceful forest, with its lush greenery, clear blue sky, and tranquil ambiance. It's a beautiful representation of nature's beauty, inviting viewers to appreciate the serenity and majesty of the natural world.
query: <image>How many sheep are in the picture?
response: There are five sheep in the picture.
""" |
非常感谢你jintao-huang,
Looking forward ur reply, Thank u! |
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
from swift.llm import deploy_main, DeployArguments
# 与swift deploy相同的参数
deploy_main(DeployArguments(...))
seed为None即可(默认)
是的 |
我是否可以使用 get_vllm_engine 的接口方式,启动 vllm 服务呢?和 deploy_main 的方式有什么区别呢?
|
minicpmv2-6 & vllm 开启服务 要求安装flash-attn的问题已经修复 |
用 deploy_main sdk同样的 cli 参数会报错:
|
|
请教一下 VLLM+异步客户端调用 支持 官方的Fewshot 功能么?fewshot 功能如下:
|
支持的, 这个就是多轮对话 |
Thanks for your input @yingdachen I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason |
How to evaluate with custom dataset(test video data) its throwing error ? I am creating UI using flask but getting error - NotImplementedError: Cannot copy out of meta tensor; no data!- any reason @yingdachen, any input ??? |
This error indicates insufficient GPU memory. |
@yingdachen for which error I am getting two error !! |
利用zero3微调MiniCPM-V2.6报错,只是将图片微调的命令从zero2改成了默认的zero3,就出现了报错: |
deepspeed版本调整一下 |
您好,由于GPU显存不够所以想尝试用int4的模型取finetune,但是我看swift得官方文档里面没有说支持minicpm的int4模型微调https://github.com/modelscope/ms-swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B |
偏好数据训练时,格式需要怎样的,其他模型可以用的,训练这个模型报错 |
请问微调完成后怎么获得用于部署的gguf模型呢 |
minicpm转gguf的流程需要一些定制化操作,可以参考minicmp的官方文档: |
|
我也遇到过这个问题,用--deepspeed zero3-offload需要很大的内存,一般卡住不动都是因为机器内存满了导致的 |
deepspeed不能用虚拟内存是吗? |
可以用虚拟内存,我也是用了的,但是还是会卡住,我是在train出来之后卡住的,后来多加了几根内存条就好了 |
我们使用 |
请教一下,用swift在进行视频微调的时候是不是也是通过抽帧实现的?这个抽帧率默认用的多少,能修改吗?在命令行里没找到这个参数,试着用了下sample_n_frames.显示不支持。因为现在用图片和视频微调,就得算下图片的配比,需要知道抽帧率,谢谢。 |
升级一下ms-swift |
ms-swift/swift/llm/utils/template.py Line 3346 in 6330c70
MAX_NUM_FRAMES |
https://swift.readthedocs.io/en/latest/Multi-Modal/index.html |
Hi I finetuned MiniCPM-V 2.6 model using #1613. And deployed the merged model using CUDA_VISIBLE_DEVICES=0 swift deploy --ckpt_dir output/minicpm-v-v2_5-chat/vx-xxx/checkpoint-xxx-merged when trying to call the post api, it is not responding INFO: 2024-09-30 08:25:57,729 deploy.py:157] {'request_id': 'chatcmpl-f515986bf3d24c9e9b66f6a83d48a0eb', 'model': 'minicpm-v-v2_6-chat', 'messages': [{'role': 'user', 'content': 'Describe this image.'}], 'generation_config': GenerationConfig({'bos_token_id': 151643, 'eos_token_id': 151645, 'max_new_tokens': 32410, 'pad_token_id': 151643, 'return_dict_in_generate': True}), 'seed': None, 'stop': [], 'stream': False} I can see the hit is made in the terminal logs but no response can be found on postman |
What is the format of eval dataset. How to validate the eval dataset and what is the meaning of label key in result dataset. Should the response key in the eval data should be empty? |
Same issue here. My inference dataset is the same foramt with my finetuning dataset, but the inference cli doesn't work |
the same question |
如何使用sft后的模型进行推理? |
参考了https://swift.readthedocs.io/en/latest/Multi-Modal/minicpm-v-best-practice.html,原来是这样设置: |
模型:https://modelscope.cn/models/OpenBMB/MiniCPM-V-2_6
通常,多模态大模型微调会使用自定义数据集进行微调。在这里,我们将展示可直接运行的demo。
在开始微调之前,请确保您的环境已准备妥当。
git clone https://github.com/modelscope/swift.git cd swift pip install -e .[llm]
模型推理
图片微调
我们使用 coco-en-mini 数据集进行微调,该数据集的任务是对图片内容进行描述。您可以在 modelscope 上找到该数据集:https://modelscope.cn/datasets/modelscope/coco_2014_caption/summary
# 默认会将lora_target_modules设置为llm和resampler所有的linear CUDA_VISIBLE_DEVICES=0,1,2,3 NPROC_PER_NODE=4 swift sft \ --model_type minicpm-v-v2_6-chat \ --model_id_or_path OpenBMB/MiniCPM-V-2_6 \ --sft_type lora \ --dataset coco-en-mini#20000 \ --deepspeed default-zero2
如果要使用自定义数据集,只需按以下方式进行指定:
自定义数据集支持json和jsonl样式,以下是自定义数据集的样例:
显存占用:
微调后推理脚本如下:
微调后模型对验证集进行推理的示例(时间原因,只训练了300个step):
视频微调
我们使用 video-chatgpt 数据集进行微调,该数据集的任务是对视频内容进行描述。您可以在 modelscope 上找到该数据集:https://modelscope.cn/datasets/swift/VideoChatGPT
自定义数据集支持json和jsonl样式,以下是自定义数据集的样例:
显存占用:
微调后推理脚本如下:
微调后模型对验证集进行推理的示例(时间原因,只训练了50个step):
The text was updated successfully, but these errors were encountered: