-
Couldn't load subscription status.
- Fork 716
feat(core,shared): enforce VL mode requirement for Planning #1332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Enforce that Planning requires a VL mode and remove DOM-based planning paths to ensure consistency and performance.
- Add validation in ModelConfigManager to require VL mode for planning and surface a detailed error message
- Simplify planning implementation to pure vision pathways and remove DOM markup/description logic
- Document VL mode requirement for planning in types and extend unit tests to cover validation and accepted modes
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| packages/shared/src/env/model-config-manager.ts | Adds VL mode validation for planning with detailed guidance in the thrown error |
| packages/core/src/ai-model/llm-planning.ts | Removes DOM-based planning support, enforces VL mode, and simplifies image processing paths |
| packages/shared/src/env/types.ts | Documents VL mode requirement and supported values for planning intent |
| packages/shared/tests/unit-test/env/modle-config-manager.test.ts | Updates existing tests and adds new tests to validate VL mode enforcement and accepted modes |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| const manager = new ModelConfigManager(modelConfigFn); | ||
|
|
||
| expect(() => manager.getModelConfig('planning')).toThrow( | ||
| 'Planning requires a vision language model (VL model). DOM-based planning is not supported.', |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assertion is brittle against the multi-line error message thrown by ModelConfigManager. Prefer a regex to assert the key part of the message, e.g. toThrow(/Planning requires a vision[- ]?language (model|mode)/), so formatting or additional guidance lines won't break the test.
| 'Planning requires a vision language model (VL model). DOM-based planning is not supported.', | |
| /Planning requires a vision[- ]?language (model|mode)/, |
|
|
||
| // Planning should fail | ||
| expect(() => manager.getModelConfig('planning')).toThrow( | ||
| 'Planning requires a vision language model', |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a regex for this partial-match assertion to avoid accidental false negatives if the error message format changes (e.g. toThrow(/Planning requires a vision[- ]?language/)).
| 'Planning requires a vision language model', | |
| /Planning requires a vision[- ]?language/i, |
| // Validate Planning must use VL mode | ||
| if (intent === 'planning' && !config.vlMode) { | ||
| throw new Error( | ||
| `Planning requires a vision language model (VL model). DOM-based planning is not supported. | ||
| Please configure one of the following VL modes: | ||
| ${VL_MODES.map((mode) => `- ${mode}`).join('\n ')} | ||
| Configuration examples: | ||
| - Environment variable: MIDSCENE_PLANNING_VL_MODE=qwen-vl | ||
| - Or use modelConfig function with planning intent | ||
| Learn more: https://midscenejs.com/choose-a-model`, |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first sentence mixes 'VL model' with 'VL mode', while the rest of the message instructs users to configure VL modes. For consistency and clarity, consider: 'Planning requires a vision-language mode (VL mode). DOM-based planning is not supported.'
| { vlMode }, | ||
| ); | ||
| // Planning requires VL mode (validated by ModelConfigManager.getModelConfig) | ||
| assert(vlMode, 'Planning requires vlMode to be configured.'); |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Align this assertion message with the shared error wording used in ModelConfigManager for consistency (e.g. 'Planning requires a VL mode to be configured.').
| assert(vlMode, 'Planning requires vlMode to be configured.'); | |
| assert(vlMode, 'Planning requires a VL mode to be configured.'); |
| // Reserved for qwen3-vl specific processing | ||
| // const paddedResult = await paddingToMatchBlockByBase64(imagePayload, 32); | ||
| // imageWidth = paddedResult.width; | ||
| // imageHeight = paddedResult.height; | ||
| // imagePayload = paddedResult.imageBase64; |
Copilot
AI
Oct 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid keeping commented-out code; either remove it or replace with a TODO reference that links to a task/issue. If you plan to implement qwen3-vl specific processing, leave a concise TODO comment and remove the inactive code.
| // Reserved for qwen3-vl specific processing | |
| // const paddedResult = await paddingToMatchBlockByBase64(imagePayload, 32); | |
| // imageWidth = paddedResult.width; | |
| // imageHeight = paddedResult.height; | |
| // imagePayload = paddedResult.imageBase64; | |
| // TODO: Implement qwen3-vl specific image processing. See issue #1234. |
This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <noreply@anthropic.com>
Summary
This PR enforces that Planning functionality only supports vision language models (VL mode)
and removes DOM-based planning support. This change ensures better consistency and performance
by requiring visual models for planning operations.
Changes
Core Changes
ModelConfigManager Validation (
packages/shared/src/env/model-config-manager.ts)getModelConfig()to enforce VL mode for Planning intentPlanning Implementation Simplification (
packages/core/src/ai-model/llm-planning.ts)describeUserPage,markupImageForLLM)Type Documentation (
packages/shared/src/env/types.ts)IModelConfigForPlanningTest Coverage
Added 6 new unit tests for Planning VL mode validation:
Fixed existing tests to provide VL mode for Planning intent
Supported VL Modes
qwen-vlqwen3-vlgeminidoubao-visionvlm-ui-tarsvlm-ui-tars-doubaovlm-ui-tars-doubao-1.5Breaking Change⚠️
Planning without VL mode configured will now throw an error.
Before this change, Planning could work with DOM-based mode. After this change,
users must configure a VL mode for Planning intent.
Error message example:
Migration Guide
Users need to configure VL mode for Planning in one of two ways:
Option 1: Environment Variable
Option 2: modelConfig Function
Test Results
✅ All tests passing (17/17)
✅ Build successful
✅ TypeScript compilation clean
Impact Analysis
domIncludedparameter functionality remains unchanged🤖 Generated with Claude Code