-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Description
System Info / 系統信息
- OS: Windows 10
- Python: 3.14.0
- Model: GLM-4.6v (via API)
- Device: Samsung Android (ADB connected)
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- The official example scripts / 官方的示例脚本
- My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
Description
The parse_action() function in phone_agent/actions/handler.py fails to parse outputs from the new GLM-4.6v model because the model wraps its responses in special tokens (<|begin_of_box|>).
Steps to Reproduce
- Clone the repository and install dependencies.
- Install ADB Keyboard on the device.
- Run the agent using the GLM-4.6v model endpoint:
python main.py --base-url [https://open.bigmodel.cn/api/paas/v4/](https://open.bigmodel.cn/api/paas/v4/) --apikey YOUR_API_KEY --model glm-4.6v --lang en "Open Settings"
Expected behavior / 期待表现
The parser should strictly clean and strip the <|begin_of_box|> and <|end_of_box|> tags before attempting to parse the action.
Also, it should extract the command starting with "do(" or "finish(" to ignore any "thinking process" text that the GLM-4.6v model generates before the command.
I have temporarily fixed this locally by updating phone_agent/actions/handler.py to strip these tags and search for the command substring.
Metadata
Metadata
Assignees
Labels
No labels