-
Notifications
You must be signed in to change notification settings - Fork 743
refactor(1.0): expose action space into mcp #1481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| async function runServer() { | ||
| server.server.registerCapabilities({ | ||
| resources: {}, | ||
| logging: {}, | ||
| }); | ||
|
|
||
| // Server capabilities are now properly registered | ||
| midsceneManager = new MidsceneManager(); | ||
|
|
||
| // Register resource handlers BEFORE connecting | ||
| server.server.setRequestHandler( | ||
| ListResourcesRequestSchema, | ||
| handleListResources, | ||
| ); | ||
| server.server.setRequestHandler( | ||
| ReadResourceRequestSchema, | ||
| handleReadResource, | ||
| ); | ||
| // Initialize tools asynchronously (independent of server) | ||
| await midsceneManager.initTools(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-register MCP resource handlers
In runServer (lines 20‑24) the server now only creates a MidsceneManager, calls initTools, and connects transport, but it never calls server.server.setRequestHandler for ListResourcesRequestSchema, ReadResourceRequestSchema, or SetLevelRequestSchema anymore. As a result, MCP clients can no longer call resources/list, resources/read, or update logging level—the SDK will reject those requests with “method not found,” breaking previously working features such as fetching console logs or saved screenshots. Please wire handleListResources, handleReadResource, and the logging handler back up before calling server.connect.
Useful? React with 👍 / 👎.
| // Prepare dynamic action space tools | ||
| const agent = await this.ensureAgent(); | ||
| const actionSpace = await agent.getActionSpace(); | ||
| const actionTools = this.prepareActionSpaceToolDefinitions(actionSpace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid requiring a live browser during startup
initTools() now unconditionally calls this.ensureAgent()/agent.getActionSpace() (lines 422‑425) before the MCP server even starts listening. ensureAgent immediately tries to connect to Chrome/ADB; when the browser extension or device is not yet available—which is a common deployment scenario—the promise rejects with “Unable to establish Bridge mode connection…” and the whole server exits before it can accept requests. Previously the agent was created lazily on the first tool invocation so the server could start independently. Please defer agent creation (and action space discovery) until a tool actually runs, or handle connection failures so the server can continue to start.
Useful? React with 👍 / 👎.
| const tools = actionSpace.map((action) => ({ | ||
| name: action.name, | ||
| description: `Ask Midscene (a helper that can understand natural language and perform actions) to perform the action "${action.name}", this action is defined as follows: ${action.description || 'No description provided'}.`, | ||
| schema: { | ||
| instruction: z |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preserve documented MCP tool names
Dynamic tool registration now uses the raw action.name from the action space as the MCP tool name (lines 372‑376). For the default web/Android devices those names are simple verbs like Tap, Scroll, Input, etc., which no longer match the canonical midscene_aiTap, midscene_aiScroll, … names still exported from packages/mcp/src/tools.ts and referenced in docs/tests. As soon as this change ships, existing clients calling midscene_aiTap, midscene_aiScroll, etc. will receive “Tool not found.” Either keep the original names when exposing the action space or update the contract/documents/tests accordingly; otherwise the MCP API is silently broken.
Useful? React with 👍 / 👎.
✅ Deploy Preview for midscene ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const legacyModelFamily = useLegacyLogic | ||
| ? legacyConfigToModelFamily(provider) | ||
| : undefined; | ||
|
|
||
| const modelFamilyRaw = provider[keys.modelFamily] || legacyModelFamily; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Planning model ignores VL family when override is set
When a dedicated planning model is provided (e.g., via MIDSCENE_PLANNING_MODEL_NAME or an isolated modelConfig map), the new parser only looks up keys.modelFamily, which for planning is the literal string "THERE_IS_NO_MODEL_FAMILY_FOR_PLANNING". As a result vlMode is always undefined for planning overrides even if MIDSCENE_MODEL_FAMILY is set, so VL-specific preprocessing (padding and bbox filling in plan() at packages/core/src/ai-model/llm-planning.ts:40-59) is skipped for Qwen/UI‑TARS models. This regresses earlier behavior where planning configs inherited the global model family and can degrade element localization for dedicated planning models.
Useful? React with 👍 / 👎.
No description provided.