Skip to content

Conversation

@yuyutaotao
Copy link
Collaborator

No description provided.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 20 to 24
async function runServer() {
server.server.registerCapabilities({
resources: {},
logging: {},
});

// Server capabilities are now properly registered
midsceneManager = new MidsceneManager();

// Register resource handlers BEFORE connecting
server.server.setRequestHandler(
ListResourcesRequestSchema,
handleListResources,
);
server.server.setRequestHandler(
ReadResourceRequestSchema,
handleReadResource,
);
// Initialize tools asynchronously (independent of server)
await midsceneManager.initTools();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Re-register MCP resource handlers

In runServer (lines 20‑24) the server now only creates a MidsceneManager, calls initTools, and connects transport, but it never calls server.server.setRequestHandler for ListResourcesRequestSchema, ReadResourceRequestSchema, or SetLevelRequestSchema anymore. As a result, MCP clients can no longer call resources/list, resources/read, or update logging level—the SDK will reject those requests with “method not found,” breaking previously working features such as fetching console logs or saved screenshots. Please wire handleListResources, handleReadResource, and the logging handler back up before calling server.connect.

Useful? React with 👍 / 👎.

Comment on lines +422 to +425
// Prepare dynamic action space tools
const agent = await this.ensureAgent();
const actionSpace = await agent.getActionSpace();
const actionTools = this.prepareActionSpaceToolDefinitions(actionSpace);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid requiring a live browser during startup

initTools() now unconditionally calls this.ensureAgent()/agent.getActionSpace() (lines 422‑425) before the MCP server even starts listening. ensureAgent immediately tries to connect to Chrome/ADB; when the browser extension or device is not yet available—which is a common deployment scenario—the promise rejects with “Unable to establish Bridge mode connection…” and the whole server exits before it can accept requests. Previously the agent was created lazily on the first tool invocation so the server could start independently. Please defer agent creation (and action space discovery) until a tool actually runs, or handle connection failures so the server can continue to start.

Useful? React with 👍 / 👎.

Comment on lines +372 to +376
const tools = actionSpace.map((action) => ({
name: action.name,
description: `Ask Midscene (a helper that can understand natural language and perform actions) to perform the action "${action.name}", this action is defined as follows: ${action.description || 'No description provided'}.`,
schema: {
instruction: z

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve documented MCP tool names

Dynamic tool registration now uses the raw action.name from the action space as the MCP tool name (lines 372‑376). For the default web/Android devices those names are simple verbs like Tap, Scroll, Input, etc., which no longer match the canonical midscene_aiTap, midscene_aiScroll, … names still exported from packages/mcp/src/tools.ts and referenced in docs/tests. As soon as this change ships, existing clients calling midscene_aiTap, midscene_aiScroll, etc. will receive “Tool not found.” Either keep the original names when exposing the action space or update the contract/documents/tests accordingly; otherwise the MCP API is silently broken.

Useful? React with 👍 / 👎.

@yuyutaotao yuyutaotao marked this pull request as draft November 20, 2025 06:26
@netlify
Copy link

netlify bot commented Nov 21, 2025

Deploy Preview for midscene ready!

Name Link
🔨 Latest commit 703569e
🔍 Latest deploy log https://app.netlify.com/projects/midscene/deploys/6923c8157e4bed0008387cc7
😎 Deploy Preview https://deploy-preview-1481--midscene.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@yuyutaotao yuyutaotao marked this pull request as ready for review November 23, 2025 16:07
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +162 to +166
const legacyModelFamily = useLegacyLogic
? legacyConfigToModelFamily(provider)
: undefined;

const modelFamilyRaw = provider[keys.modelFamily] || legacyModelFamily;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Planning model ignores VL family when override is set

When a dedicated planning model is provided (e.g., via MIDSCENE_PLANNING_MODEL_NAME or an isolated modelConfig map), the new parser only looks up keys.modelFamily, which for planning is the literal string "THERE_IS_NO_MODEL_FAMILY_FOR_PLANNING". As a result vlMode is always undefined for planning overrides even if MIDSCENE_MODEL_FAMILY is set, so VL-specific preprocessing (padding and bbox filling in plan() at packages/core/src/ai-model/llm-planning.ts:40-59) is skipped for Qwen/UI‑TARS models. This regresses earlier behavior where planning configs inherited the global model family and can degrade element localization for dedicated planning models.

Useful? React with 👍 / 👎.

@yuyutaotao yuyutaotao merged commit ab07218 into 1.0 Nov 24, 2025
10 checks passed
@yuyutaotao yuyutaotao deleted the feat/mcp-1.0 branch November 24, 2025 03:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants