Skip to content

feat(tui): vision-aware image paste with non-vision fallback#1513

Open
wqymi wants to merge 11 commits into
mainfrom
vb/6a3b-tui-paste
Open

feat(tui): vision-aware image paste with non-vision fallback#1513
wqymi wants to merge 11 commits into
mainfrom
vb/6a3b-tui-paste

Conversation

@wqymi

@wqymi wqymi commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

Makes TUI image paste vision-aware, with an end-to-end guarantee that non-vision models never receive image bytes (which cause hallucination), plus a discovery path so the agent knows which vision models it can dispatch to.

Vision-aware paste + fallback

  • Unify vision capability: removed the redundant hardcoded supportsImageInput special-casing in transform.ts; capabilities.input.image is now the single source of truth (models.dev already marks mimo-v2.5 / mimo-v2-omni as image-capable). Kept one explicit mimo-auto override (free-tier routing alias absent from models.dev).
  • TUI paste routing: vision model → existing base64 attach; non-vision model → spill clipboard image to a temp file and insert an @file reference (matching autocomplete's file-part shape), with a toast. Same fallback when pasting an image-file path. PDF/SVG handling unchanged. New Clipboard.spillImage helper. i18n key across all 7 locales.
  • Build-time notice: system.ts injects a <vision-capability> block for non-vision models only — tells them they can't see images and how to proceed.
  • Read tool harness: a non-vision model reading an image gets a warning (no base64 attachment); vision models and PDF reads unchanged. Uses Effect.catchDefect so an unresolvable model degrades to non-vision rather than crashing.

Model discovery (so --model is usable)

  • actor models verb: new read-only subcommand listing available models (actor models), optionally filtered to vision-capable (actor models --vision), count-capped. Mirrors the subagent_type discoverability fix.
  • --model description now points at actor models for valid values.
  • Non-vision hints name up to 3 real vision models (from Provider.list(), deterministic) + point to actor models --vision for the rest — in both the system prompt and the Read-tool warning. Zero configured vision models → suggests configuring one or using OCR. The discovery loop closes: read image → warning names a model → actor models --visionactor run ... --model <real vision model>.

Test Plan

  • bun typecheck clean (all 12 turbo tasks pass on pre-push)
  • bun test test/tool/actor-models.test.ts test/tool/read-vision-harness.test.ts test/tool/read.test.ts test/cli/tui/clipboard-spill.test.ts → 44 pass, 0 fail
  • Non-vision model + paste image → temp file + @file reference + toast; vision model → base64 attach unchanged
  • Non-vision model + read image → warning naming a real vision model + actor models --vision; vision model → attachment
  • Unresolvable/missing model → degrades to non-vision (no crash), covered by tests
  • actor models lists all; actor models --vision filters to image-capable; --limit caps
  • Non-vision system prompt contains <vision-capability> with real model names; vision model does not

Plans: docs/compose/plans/2026-07-01-tui-paste-image-vision-fallback.md, docs/compose/plans/2026-07-01-actor-models-discovery.md

@danhduykhan-code

Copy link
Copy Markdown

Summary

Makes TUI image paste vision-aware. When the active model supports vision, images attach as before. When it does not, images degrade to a file-path reference instead of ever putting base64 bytes into a non-vision model's context (which causes hallucination). The guarantee is enforced end-to-end: at the paste boundary, at system-prompt build time, and at the Read tool.

  • Unify vision capability (A): removed the redundant hardcoded supportsImageInput special-casing in transform.ts; capabilities.input.image is now the single source of truth (models.dev already marks mimo-v2.5 / mimo-v2-omni as image-capable). Kept one explicit mimo-auto override (free-tier routing alias absent from models.dev).
  • TUI paste routing (B): vision model → existing base64 attach; non-vision model → spill clipboard image to a temp file and insert an @file reference (matching autocomplete's file-part shape), with a toast. Same fallback when pasting an image-file path. PDF/SVG handling unchanged. New Clipboard.spillImage helper. i18n key added across all 7 locales.
  • Build-time notice (C.1): system.ts injects a <vision-capability> block for non-vision models only, telling them they can't see images — dispatch a vision subagent (actor run ... --model <vision model>) with the image path, or use hexdump for raw binary needs.
  • Read tool harness (C.2): a non-vision model reading an image gets a warning (no base64 attachment); vision models and PDF reads are unchanged. Uses Effect.catchDefect so an unresolvable model degrades to non-vision rather than crashing.

Test Plan

  • bun typecheck clean (all 12 turbo tasks pass on pre-push)
  • bun test test/cli/tui/clipboard-spill.test.ts test/tool/read-vision-harness.test.ts test/tool/read.test.ts → 41 pass, 0 fail
  • Non-vision model + paste clipboard image → temp file written, @<file> inserted, toast shown
  • Vision model + paste clipboard image → base64 attachment unchanged
  • Non-vision model + read on an image → warning, no attachment; vision model → attachment present
  • Unresolvable / missing model → degrades to non-vision (no crash), covered by tests
  • Non-vision model system prompt contains <vision-capability>; vision model does not

Plan: docs/compose/plans/2026-07-01-tui-paste-image-vision-fallback.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants