Skip to content

Best practice for Qwen2-Audio #1653

Open
@Jintao-Huang

Description

环境准备 (Environmental Preparation)

# 安装ms-swift (Install ms-swift)
pip install git+https://github.com/modelscope/swift.git#egg=ms-swift[llm]

# 安装最新的transformers(Install the latest transformers.)
pip install git+https://github.com/huggingface/transformers.git

pip install librosa

推理(Inference)

instruct model:

CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen2-audio-7b-instruct
# 如果是本地路径(If it is a local path.)
CUDA_VISIBLE_DEVICES=0 swift infer \
    --model_type qwen2-audio-7b-instruct \
    --model_id_or_path '<local_path>'

推理效果:(Inference result:)

<<< <audio>
Input an audio path or URL <<< https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/guess_age_gender.wav
Yes, I can guess that you are a female in your twenties.
--------------------------------------------------
<<< <audio>
Input an audio path or URL <<< https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/translate_to_chinese.wav
每个人都希望被欣赏,所以如果你欣赏某人,不要把它保密。
--------------------------------------------------
<<< clear
<<< 你是谁
我是来自达摩院的语言模型,我叫通义千问。

使用python调用:(Using Python)

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

from swift.llm import (
    get_model_tokenizer, get_template, inference, ModelType,
    get_default_template_type, inference_stream
)
from swift.utils import seed_everything
import torch

model_type = ModelType.qwen2_audio_7b_instruct
model_id_or_path = None
template_type = get_default_template_type(model_type)
print(f'template_type: {template_type}')

model, tokenizer = get_model_tokenizer(model_type, torch.float16, model_id_or_path=model_id_or_path,
                                       model_kwargs={'device_map': 'auto'})
model.generation_config.max_new_tokens = 256
template = get_template(template_type, tokenizer)
seed_everything(42)

query = '<audio>这段语音说了什么'
audios = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/weather.wav']
response, history = inference(model, template, query, audios=audios)
print(f'query: {query}')
print(f'response: {response}')

# 流式(streaming)
query = '这段语音是男生还是女生'
gen = inference_stream(model, template, query, history, audios=audios)
print_idx = 0
print(f'query: {query}\nresponse: ', end='')
for response, history in gen:
    delta = response[print_idx:]
    print(delta, end='', flush=True)
    print_idx = len(response)
print()
print(f'history: {history}')
"""
query: <audio>这段语音说了什么
response: 这段语音说的是:'今天天气真好呀'
query: 这段语音是男生还是女生
response: 男声。
history: [['<audio>这段语音说了什么', "这段语音说的是:'今天天气真好呀'"], ['这段语音是男生还是女生', '男声。']]
"""

显存占用:(Memory usage:)
截屏2024-08-09 15 22 09

Base Model:

CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen2-audio-7b

推理效果:(Inference result)

<<< <audio>Generate the caption in English:
Input an audio path or URL <<< https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Audio/glass-breaking-151256.mp3
Glass is breaking.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions