feat(tui): vision-aware image paste with non-vision fallback by wqymi · Pull Request #1513 · XiaomiMiMo/MiMo-Code

wqymi · 2026-07-01T14:29:22Z

Summary

Makes TUI image paste vision-aware, with an end-to-end guarantee that non-vision models never receive image bytes (which cause hallucination), plus a discovery path so the agent knows which vision models it can dispatch to.

Vision-aware paste + fallback

Unify vision capability: removed the redundant hardcoded supportsImageInput special-casing in transform.ts; capabilities.input.image is now the single source of truth (models.dev already marks mimo-v2.5 / mimo-v2-omni as image-capable). Kept one explicit mimo-auto override (free-tier routing alias absent from models.dev).
TUI paste routing: vision model → existing base64 attach; non-vision model → spill clipboard image to a temp file and insert an @file reference (matching autocomplete's file-part shape), with a toast. Same fallback when pasting an image-file path. PDF/SVG handling unchanged. New Clipboard.spillImage helper. i18n key across all 7 locales.
Build-time notice: system.ts injects a <vision-capability> block for non-vision models only — tells them they can't see images and how to proceed.
Read tool harness: a non-vision model reading an image gets a warning (no base64 attachment); vision models and PDF reads unchanged. Uses Effect.catchDefect so an unresolvable model degrades to non-vision rather than crashing.

Model discovery (so `--model` is usable)

actor models verb: new read-only subcommand listing available models (actor models), optionally filtered to vision-capable (actor models --vision), count-capped. Mirrors the subagent_type discoverability fix.
--model description now points at actor models for valid values.
Non-vision hints name up to 3 real vision models (from Provider.list(), deterministic) + point to actor models --vision for the rest — in both the system prompt and the Read-tool warning. Zero configured vision models → suggests configuring one or using OCR. The discovery loop closes: read image → warning names a model → actor models --vision → actor run ... --model <real vision model>.

Test Plan

bun typecheck clean (all 12 turbo tasks pass on pre-push)
bun test test/tool/actor-models.test.ts test/tool/read-vision-harness.test.ts test/tool/read.test.ts test/cli/tui/clipboard-spill.test.ts → 44 pass, 0 fail
Non-vision model + paste image → temp file + @file reference + toast; vision model → base64 attach unchanged
Non-vision model + read image → warning naming a real vision model + actor models --vision; vision model → attachment
Unresolvable/missing model → degrades to non-vision (no crash), covered by tests
actor models lists all; actor models --vision filters to image-capable; --limit caps
Non-vision system prompt contains <vision-capability> with real model names; vision model does not

Plans: docs/compose/plans/2026-07-01-tui-paste-image-vision-fallback.md, docs/compose/plans/2026-07-01-actor-models-discovery.md

…e with mimo-auto override

…n models

…ead of image bytes

…n-vision

…lback

danhduykhan-code · 2026-07-01T14:34:57Z

Summary

Makes TUI image paste vision-aware. When the active model supports vision, images attach as before. When it does not, images degrade to a file-path reference instead of ever putting base64 bytes into a non-vision model's context (which causes hallucination). The guarantee is enforced end-to-end: at the paste boundary, at system-prompt build time, and at the Read tool.

Unify vision capability (A): removed the redundant hardcoded supportsImageInput special-casing in transform.ts; capabilities.input.image is now the single source of truth (models.dev already marks mimo-v2.5 / mimo-v2-omni as image-capable). Kept one explicit mimo-auto override (free-tier routing alias absent from models.dev).

TUI paste routing (B): vision model → existing base64 attach; non-vision model → spill clipboard image to a temp file and insert an @file reference (matching autocomplete's file-part shape), with a toast. Same fallback when pasting an image-file path. PDF/SVG handling unchanged. New Clipboard.spillImage helper. i18n key added across all 7 locales.

Build-time notice (C.1): system.ts injects a <vision-capability> block for non-vision models only, telling them they can't see images — dispatch a vision subagent (actor run ... --model <vision model>) with the image path, or use hexdump for raw binary needs.

Read tool harness (C.2): a non-vision model reading an image gets a warning (no base64 attachment); vision models and PDF reads are unchanged. Uses Effect.catchDefect so an unresolvable model degrades to non-vision rather than crashing.

Test Plan

bun typecheck clean (all 12 turbo tasks pass on pre-push)

bun test test/cli/tui/clipboard-spill.test.ts test/tool/read-vision-harness.test.ts test/tool/read.test.ts → 41 pass, 0 fail

Non-vision model + paste clipboard image → temp file written, @<file> inserted, toast shown

Vision model + paste clipboard image → base64 attachment unchanged

Non-vision model + read on an image → warning, no attachment; vision model → attachment present

Unresolvable / missing model → degrades to non-vision (no crash), covered by tests

Non-vision model system prompt contains <vision-capability>; vision model does not

Plan: docs/compose/plans/2026-07-01-tui-paste-image-vision-fallback.md

…dels

…t to actor models

wqymi added 7 commits July 1, 2026 22:23

refactor(provider): unify image capability on capabilities.input.imag…

3e3d91c

…e with mimo-auto override

feat(tui): add spillImage helper to write clipboard image to temp file

125969b

feat(tui): degrade pasted images to file-path reference for non-visio…

335b340

…n models

feat(session): inject vision-capability notice for non-vision models

c57dcba

feat(tool): harness read tool so non-vision models get a warning inst…

a5cdc60

…ead of image bytes

fix(tool): catch getModel defect so unresolvable model degrades to no…

4929030

…n-vision

docs(compose): add implementation plan for TUI paste image vision fal…

9adae59

…lback

wqymi added 4 commits July 2, 2026 00:48

feat(actor): add models verb to list available (optionally vision) mo…

649b566

…dels

docs(actor): point --model description at the models verb for discovery

28e80ad

feat(session,tool): name real vision models in non-vision hints, poin…

e4ca392

…t to actor models

docs(compose): add implementation plan for actor models discovery

7859223

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tui): vision-aware image paste with non-vision fallback#1513

feat(tui): vision-aware image paste with non-vision fallback#1513
wqymi wants to merge 11 commits into
mainfrom
vb/6a3b-tui-paste

wqymi commented Jul 1, 2026 •

edited

Loading

Uh oh!

danhduykhan-code commented Jul 1, 2026

Summary

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

wqymi commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Vision-aware paste + fallback

Model discovery (so --model is usable)

Test Plan

Uh oh!

danhduykhan-code commented Jul 1, 2026

Summary

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wqymi commented Jul 1, 2026 •

edited

Loading

Model discovery (so `--model` is usable)