feat(tui): vision-aware image paste with non-vision fallback#1513
Open
wqymi wants to merge 11 commits into
Open
feat(tui): vision-aware image paste with non-vision fallback#1513wqymi wants to merge 11 commits into
wqymi wants to merge 11 commits into
Conversation
…e with mimo-auto override
…ead of image bytes
|
…t to actor models
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes TUI image paste vision-aware, with an end-to-end guarantee that non-vision models never receive image bytes (which cause hallucination), plus a discovery path so the agent knows which vision models it can dispatch to.
Vision-aware paste + fallback
supportsImageInputspecial-casing intransform.ts;capabilities.input.imageis now the single source of truth (models.dev already marks mimo-v2.5 / mimo-v2-omni as image-capable). Kept one explicitmimo-autooverride (free-tier routing alias absent from models.dev).@filereference (matching autocomplete's file-part shape), with a toast. Same fallback when pasting an image-file path. PDF/SVG handling unchanged. NewClipboard.spillImagehelper. i18n key across all 7 locales.system.tsinjects a<vision-capability>block for non-vision models only — tells them they can't see images and how to proceed.Effect.catchDefectso an unresolvable model degrades to non-vision rather than crashing.Model discovery (so
--modelis usable)actor modelsverb: new read-only subcommand listing available models (actor models), optionally filtered to vision-capable (actor models --vision), count-capped. Mirrors thesubagent_typediscoverability fix.--modeldescription now points atactor modelsfor valid values.Provider.list(), deterministic) + point toactor models --visionfor the rest — in both the system prompt and the Read-tool warning. Zero configured vision models → suggests configuring one or using OCR. The discovery loop closes: read image → warning names a model →actor models --vision→actor run ... --model <real vision model>.Test Plan
bun typecheckclean (all 12 turbo tasks pass on pre-push)bun test test/tool/actor-models.test.ts test/tool/read-vision-harness.test.ts test/tool/read.test.ts test/cli/tui/clipboard-spill.test.ts→ 44 pass, 0 fail@filereference + toast; vision model → base64 attach unchangedreadimage → warning naming a real vision model +actor models --vision; vision model → attachmentactor modelslists all;actor models --visionfilters to image-capable;--limitcaps<vision-capability>with real model names; vision model does notPlans:
docs/compose/plans/2026-07-01-tui-paste-image-vision-fallback.md,docs/compose/plans/2026-07-01-actor-models-discovery.md