Skip to content

feat: add qwen mllm #251

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
<img src="assets/openai.svg" alt="OpenAI whisper" width="60" height="60" />
<img src="assets/zhipu-color.svg" alt="Zhipu GLM-4V-PLUS" width="60" height="60" />
<img src="assets/gemini-brand-color.svg" alt="Google Gemini 1.5 Pro" width="60" height="60" />
<img src="assets/qwen-color.svg" alt="Qwen-2.5-72B-Instruct" width="60" height="60" />

</div>

Expand All @@ -34,8 +35,10 @@
- **自动渲染弹幕**:自动转换xml为ass弹幕文件,该转换工具库已经开源 [DanmakuConvert](https://github.com/timerring/DanmakuConvert) 并且渲染到视频中形成**有弹幕版视频**并自动上传。
- **硬件要求极低**:无需GPU,只需最基础的单核CPU搭配最低的运存即可完成录制,弹幕渲染,上传等等全部过程,无最低配置要求,10年前的电脑或服务器依然可以使用!
- **( :tada: NEW)自动渲染字幕**(如需使用本功能,则需保证有 Nvidia 显卡):采用 OpenAI 的开源模型 [`whisper`](https://github.com/openai/whisper),自动识别视频内语音并转换为字幕渲染至视频中。
- **( :tada: NEW)自动切片上传**:根据弹幕密度计算寻找高能片段并切片,该自动切片工具库已开源 [auto-slice-video](https://github.com/timerring/auto-slice-video)
,结合多模态视频理解大模型 [`GLM-4V-PLUS`](https://bigmodel.cn/dev/api/normal-model/glm-4) 或者 [`Gemini-2.0-flash`](https://deepmind.google/technologies/gemini/flash/) 自动生成有意思的切片标题及内容,并且自动上传。
- **( :tada: NEW)自动切片上传**:根据弹幕密度计算寻找高能片段并切片,该自动切片工具库已开源 [auto-slice-video](https://github.com/timerring/auto-slice-video),结合多模态视频理解大模型自动生成有意思的切片标题及内容,并且自动上传,目前已经支持的模型有:
- `GLM-4V-PLUS`
- `Gemini-2.0-flash`
- `Qwen-2.5-72B-Instruct`
- **( :tada: NEW)持久化登录/下载/上传视频(支持多p投稿)**:[bilitool](https://github.com/timerring/bilitool)已经开源,实现持久化登录,下载视频及弹幕(含多p)/上传视频(可分p投稿),查询投稿状态,查询详细信息等功能,一键pip安装,可以使用命令行 cli 操作,也可以作为api调用。
- **( :tada: NEW)自动多平台循环直播推流**:该工具已经开源 [looplive](https://github.com/timerring/looplive) 是一个 7 x 24 小时全自动**循环多平台同时推流**直播工具。

Expand Down Expand Up @@ -70,8 +73,6 @@ graph TD

## 3. 测试硬件
+ OS: Ubuntu 22.04.4 LTS

>尽量使用 22.04+ 的版本,更早版本的 ubuntu 自带 gcc 版本无法更新至 biliup-rs 所需版本,若使用较早版本,请参考 [version `GLIBC_2.34‘ not found简单有效解决方法](https://blog.csdn.net/huazhang_001/article/details/128828999)。
+ CPU:2核 Intel(R) Xeon(R) Platinum 85
+ GPU:无
+ 内存:2G
Expand Down Expand Up @@ -168,6 +169,12 @@ MLLM 模型主要用于自动切片后的切片标题生成,此功能默认关

在项目的自动切片功能需要使用到 Gemini-2.0-flash 模型,请自行[注册账号](https://aistudio.google.com/app/apikey)并申请 API Key,填写到 `src/config.py` 文件中对应的 `GEMINI_API_KEY` 中。

##### 3.2.3 Qwen 模型

> 如需使用 Qwen-2.5-72B-Instruct 模型,请将 `src/config.py` 文件中的 `MLLM_MODEL` 参数设置为 `qwen`

在项目的自动切片功能需要使用到 Qwen-2.5-72B-Instruct 模型,请自行[注册账号](https://bailian.console.aliyun.com/?apiKey=1)并申请 API Key,填写到 `src/config.py` 文件中对应的 `QWEN_API_KEY` 中。

#### 4. bilitool 登录

> 由于一般日志打印不出二维码效果(docker 的日志不确定是否能打印,等发布新image时再修改,docker 版本请先参考文档[bilive](https://bilive.timerring.com),本 README 只针对源码部署),所以这步需要提前在机器上安装 [bilitool](https://github.com/timerring/bilitool):
Expand Down
1 change: 1 addition & 0 deletions assets/qwen-color.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
39 changes: 39 additions & 0 deletions src/autoslice/mllm_sdk/qwen_sdk.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
from src.config import QWEN_API_KEY
from src.log.logger import scan_log
from openai import OpenAI
import os
import base64

def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")

def qwen_generate_title(video_path, artist):
client = OpenAI(
api_key=QWEN_API_KEY,
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
)

base64_video = encode_video(video_path)
completion = client.chat.completions.create(
model="qwen2.5-vl-72b-instruct",
messages=[
{
"role": "system",
"content": [{"type":"text","text": "你是一个视频切片员"}]},
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {"url": f"data:video/mp4;base64,{base64_video}"},
},
{"type": "text", "text": f"视频是{artist}的直播切片,请根据该视频中的内容及弹幕信息,为这段视频起一个调皮并且吸引眼球的标题,标题中不要表情符号,可以适当使用网络热词或流行语"},
],
}
],
)
scan_log.info("使用 Qwen-2.5-72B-Instruct 生成切片标题")
scan_log.info(f"Prompt: 视频是{artist}的直播切片,请根据该视频中的内容及弹幕信息,为这段视频起一个调皮并且吸引眼球的标题,标题中不要表情符号,可以适当使用网络热词或流行语")
scan_log.info(f"生成的切片标题为: {completion.choices[0].message.content.strip('"')}")
return completion.choices[0].message.content.strip('"')
3 changes: 3 additions & 0 deletions src/autoslice/title_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ def wrapper(video_path, artist):
elif model_type == "gemini":
from .mllm_sdk.gemini_sdk import gemini_generate_title
return gemini_generate_title(video_path, artist)
elif model_type == "qwen":
from .mllm_sdk.qwen_sdk import qwen_generate_title
return qwen_generate_title(video_path, artist)
else:
scan_log.error(f"Unsupported model type: {model_type}")
return None
Expand Down
4 changes: 3 additions & 1 deletion src/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,14 @@
SLICE_STEP = 1
# The minimum video size to be sliced (MB)
MIN_VIDEO_SIZE = 200
# the multi-model LLMs, can be "gemini" or "zhipu"
# the multi-model LLMs, can be "gemini" or "zhipu" or "qwen"
MLLM_MODEL = "gemini" # Please make sure you have the right API key for the LLM you choose
# Apply for your own GLM-4v-Plus API key at https://www.bigmodel.cn/invite?icode=shBtZUfNE6FfdMH1R6NybGczbXFgPRGIalpycrEwJ28%3D
ZHIPU_API_KEY = ""
# Apply for your own Gemini API key at https://aistudio.google.com/app/apikey
GEMINI_API_KEY = ""
# Apply for your own Qwen API key at https://bailian.console.aliyun.com/?apiKey=1
QWEN_API_KEY = ""
# ============================ Basic configuration ============================
SRC_DIR = str(Path(os.path.abspath(__file__)).parent)
BILIVE_DIR = str(Path(SRC_DIR).parent)
Expand Down