feat(core,shared): enforce VL mode requirement for Planning #1332

quanru · 2025-10-17T08:51:05Z

Summary

This PR enforces that Planning functionality only supports vision language models (VL mode)
and removes DOM-based planning support. This change ensures better consistency and performance
by requiring visual models for planning operations.

Changes

Core Changes

ModelConfigManager Validation (packages/shared/src/env/model-config-manager.ts)
- Added validation in getModelConfig() to enforce VL mode for Planning intent
- Provides clear error message with all supported VL modes and configuration examples
- References documentation link for users to learn more
Planning Implementation Simplification (packages/core/src/ai-model/llm-planning.ts)
- Removed DOM mode support (describeUserPage, markupImageForLLM)
- Simplified image processing logic to only support VL mode paths
- Added assertion to ensure VL mode is configured
- Removed conditional DOM description in message construction
Type Documentation (packages/shared/src/env/types.ts)
- Added comprehensive JSDoc comments to IModelConfigForPlanning
- Clearly documents VL mode requirement and lists all supported modes

Test Coverage

Added 6 new unit tests for Planning VL mode validation:
1. Isolated mode - should throw error when planning has no VL mode
2. Isolated mode - should succeed when planning has valid VL mode
3. Normal mode - should throw error when planning has no VL mode
4. Normal mode - should succeed when planning has valid VL mode
5. Planning validation failure doesn't affect other intents
6. Should accept all valid VL modes (tests all 7 VL mode values)
Fixed existing tests to provide VL mode for Planning intent

Supported VL Modes

qwen-vl
qwen3-vl
gemini
doubao-vision
vlm-ui-tars
vlm-ui-tars-doubao
vlm-ui-tars-doubao-1.5

Breaking Change ⚠️

Planning without VL mode configured will now throw an error.

Before this change, Planning could work with DOM-based mode. After this change,
users must configure a VL mode for Planning intent.

Error message example:

Planning requires a vision language model (VL model). DOM-based planning is not supported.

Please configure one of the following VL modes:
  - doubao-vision
  - gemini
  - qwen-vl
  - qwen3-vl
  - vlm-ui-tars
  - vlm-ui-tars-doubao
  - vlm-ui-tars-doubao-1.5
  
Configuration examples:
  - Environment variable: MIDSCENE_PLANNING_VL_MODE=qwen-vl
  - Or use modelConfig function with planning intent
  
Learn more: https://midscenejs.com/choose-a-model

Migration Guide

Users need to configure VL mode for Planning in one of two ways:

Option 1: Environment Variable

export MIDSCENE_PLANNING_VL_MODE=qwen-vl
export MIDSCENE_PLANNING_MODEL_NAME=qwen-vl-plus
export MIDSCENE_PLANNING_OPENAI_API_KEY=your-api-key

Option 2: modelConfig Function

const agent = new Agent({
  modelConfig: ({ intent }) => {
    if (intent === 'planning') {
      return {
        MIDSCENE_PLANNING_MODEL_NAME: 'qwen-vl-plus',
        MIDSCENE_PLANNING_OPENAI_API_KEY: 'your-api-key',
        MIDSCENE_PLANNING_VL_MODE: 'qwen-vl',
      };
    }
    // ... other intents
  },
});

Test Results

✅ All tests passing (17/17)
✅ Build successful
✅ TypeScript compilation clean

Impact Analysis

VQA not affected: VQA's domIncluded parameter functionality remains unchanged
Other intents not affected: Planning validation only applies to Planning intent
Configuration-level validation: Fails fast with clear error messages at initialization

🤖 Generated with Claude Code

This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Copilot

Pull Request Overview

Enforce that Planning requires a VL mode and remove DOM-based planning paths to ensure consistency and performance.

Add validation in ModelConfigManager to require VL mode for planning and surface a detailed error message
Simplify planning implementation to pure vision pathways and remove DOM markup/description logic
Document VL mode requirement for planning in types and extend unit tests to cover validation and accepted modes

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File	Description
packages/shared/src/env/model-config-manager.ts	Adds VL mode validation for planning with detailed guidance in the thrown error
packages/core/src/ai-model/llm-planning.ts	Removes DOM-based planning support, enforces VL mode, and simplifies image processing paths
packages/shared/src/env/types.ts	Documents VL mode requirement and supported values for planning intent
packages/shared/tests/unit-test/env/modle-config-manager.test.ts	Updates existing tests and adds new tests to validate VL mode enforcement and accepted modes

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-17T09:24:31Z

packages/shared/tests/unit-test/env/modle-config-manager.test.ts

+      const manager = new ModelConfigManager(modelConfigFn);
+
+      expect(() => manager.getModelConfig('planning')).toThrow(
+        'Planning requires a vision language model (VL model). DOM-based planning is not supported.',


This assertion is brittle against the multi-line error message thrown by ModelConfigManager. Prefer a regex to assert the key part of the message, e.g. toThrow(/Planning requires a vision[- ]?language (model|mode)/), so formatting or additional guidance lines won't break the test.

Suggested change

'Planning requires a vision language model (VL model). DOM-based planning is not supported.',

/Planning requires a vision[- ]?language (model|mode)/,

Copilot · 2025-10-17T09:24:31Z

packages/shared/tests/unit-test/env/modle-config-manager.test.ts

+
+      // Planning should fail
+      expect(() => manager.getModelConfig('planning')).toThrow(
+        'Planning requires a vision language model',


Use a regex for this partial-match assertion to avoid accidental false negatives if the error message format changes (e.g. toThrow(/Planning requires a vision[- ]?language/)).

Suggested change

'Planning requires a vision language model',

/Planning requires a vision[- ]?language/i,

Copilot · 2025-10-17T09:24:32Z

packages/shared/src/env/model-config-manager.ts

+    // Validate Planning must use VL mode
+    if (intent === 'planning' && !config.vlMode) {
+      throw new Error(
+        `Planning requires a vision language model (VL model). DOM-based planning is not supported.
+
+Please configure one of the following VL modes:
+  ${VL_MODES.map((mode) => `- ${mode}`).join('\n  ')}
+  
+Configuration examples:
+  - Environment variable: MIDSCENE_PLANNING_VL_MODE=qwen-vl
+  - Or use modelConfig function with planning intent
+  
+Learn more: https://midscenejs.com/choose-a-model`,


The first sentence mixes 'VL model' with 'VL mode', while the rest of the message instructs users to configure VL modes. For consistency and clarity, consider: 'Planning requires a vision-language mode (VL mode). DOM-based planning is not supported.'

Copilot · 2025-10-17T09:24:32Z

packages/core/src/ai-model/llm-planning.ts

-    { vlMode },
-  );
+  // Planning requires VL mode (validated by ModelConfigManager.getModelConfig)
+  assert(vlMode, 'Planning requires vlMode to be configured.');


Align this assertion message with the shared error wording used in ModelConfigManager for consistency (e.g. 'Planning requires a VL mode to be configured.').

Suggested change

assert(vlMode, 'Planning requires vlMode to be configured.');

assert(vlMode, 'Planning requires a VL mode to be configured.');

Copilot · 2025-10-17T09:24:32Z

packages/core/src/ai-model/llm-planning.ts

+    // Reserved for qwen3-vl specific processing
    // const paddedResult = await paddingToMatchBlockByBase64(imagePayload, 32);
    // imageWidth = paddedResult.width;
    // imageHeight = paddedResult.height;
    // imagePayload = paddedResult.imageBase64;


Avoid keeping commented-out code; either remove it or replace with a TODO reference that links to a task/issue. If you plan to implement qwen3-vl specific processing, leave a concise TODO comment and remove the inactive code.

Suggested change

// Reserved for qwen3-vl specific processing

// const paddedResult = await paddingToMatchBlockByBase64(imagePayload, 32);

// imageWidth = paddedResult.width;

// imageHeight = paddedResult.height;

// imagePayload = paddedResult.imageBase64;

// TODO: Implement qwen3-vl specific image processing. See issue #1234.

This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>

github-actions bot added the change: feat label Oct 17, 2025

quanru requested a review from Copilot October 17, 2025 09:21

Copilot AI reviewed Oct 17, 2025

View reviewed changes

yuyutaotao approved these changes Oct 17, 2025

View reviewed changes

quanru merged commit 80a2c97 into 1.0 Oct 17, 2025
2 checks passed

quanru deleted the feat/planning-vl-mode-only branch October 17, 2025 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(core,shared): enforce VL mode requirement for Planning #1332

feat(core,shared): enforce VL mode requirement for Planning #1332

Uh oh!

quanru commented Oct 17, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	'Planning requires a vision language model (VL model). DOM-based planning is not supported.',
	/Planning requires a vision[- ]?language (model\|mode)/,

	assert(vlMode, 'Planning requires vlMode to be configured.');
	assert(vlMode, 'Planning requires a VL mode to be configured.');

Uh oh!

feat(core,shared): enforce VL mode requirement for Planning #1332

feat(core,shared): enforce VL mode requirement for Planning #1332

Uh oh!

Conversation

quanru commented Oct 17, 2025

Summary

Changes

Core Changes

Test Coverage

Supported VL Modes

Breaking Change ⚠️

Migration Guide

Test Results

Impact Analysis

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants