Skip to content

pdf-server: annotations, interact tool, page extraction & prompt engineering#506

Open
ochafik wants to merge 12 commits intomainfrom
ochafik/pdf-interact
Open

pdf-server: annotations, interact tool, page extraction & prompt engineering#506
ochafik wants to merge 12 commits intomainfrom
ochafik/pdf-interact

Conversation

@ochafik
Copy link
Contributor

@ochafik ochafik commented Feb 26, 2026

Summary

Adds full annotation, interaction, and page extraction capabilities to the PDF server:

  • Interact tool with command queue pattern (server enqueues → client polls → processes):
    • Navigation: navigate, search, find, search_navigate, zoom
    • Annotations: add_annotations (7 types: highlight, underline, strikethrough, note, rectangle, freetext, stamp), update_annotations, remove_annotations
    • Text highlighting: highlight_text — auto-find and highlight text by query
    • Page extraction: get_pages — batch text and/or screenshot extraction from page ranges without visual navigation (offscreen rendering)
    • Form filling: fill_form — fill PDF form fields
  • Annotated PDF download via pdf-lib (client-side) + app.downloadFile() SDK support
  • Annotation persistence in localStorage keyed by toolInfo.id
  • viewUUID validation — interact returns clear error if UUID doesn't match an active viewer
  • Prompt engineering — display_pdf result enumerates all interact actions; interact description leads with annotation capabilities; schema simplified from 7,802 → 2,239 chars (dropped 14-variant anyOf union)

New dependency

  • pdf-lib (^1.17.1) — client-side PDF modification for annotated download

Files changed

File Changes
examples/pdf-server/server.ts Interact tool, annotation Zod schemas, get_pages request-response bridge, submit_page_data, viewUUID validation
examples/pdf-server/src/mcp-app.ts Annotation rendering (DOM overlays), download logic, highlight_text, get_pages offscreen rendering, persistence
examples/pdf-server/mcp-app.html Annotation layer div, download button
examples/pdf-server/src/mcp-app.css Annotation styles (per-type + dark mode)
examples/pdf-server/README.md Example prompts, testing docs, updated tools table
tests/e2e/pdf-annotations.spec.ts 6 Playwright E2E tests (annotation rendering, removal, highlight_text)
tests/e2e/pdf-annotations-api.spec.ts 3 Claude API prompt discovery tests (disabled by default, needs ANTHROPIC_API_KEY)

Test plan

  • npx playwright test tests/e2e/pdf-annotations.spec.ts — 6 tests pass (annotation CRUD, highlight_text)
  • npx playwright test -g "PDF Server" — existing screenshot tests pass
  • ANTHROPIC_API_KEY=... npx playwright test tests/e2e/pdf-annotations-api.spec.ts — 3/3 pass (model discovers annotations)
  • npm run --workspace examples/pdf-server build — compiles cleanly
  • Manual: display PDF in Claude, use interact to annotate, click download

🤖 Generated with Claude Code

@pkg-pr-new
Copy link

pkg-pr-new bot commented Feb 26, 2026

Open in StackBlitz

@modelcontextprotocol/ext-apps

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/ext-apps@506

@modelcontextprotocol/server-basic-preact

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-basic-preact@506

@modelcontextprotocol/server-basic-react

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-basic-react@506

@modelcontextprotocol/server-basic-solid

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-basic-solid@506

@modelcontextprotocol/server-basic-svelte

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-basic-svelte@506

@modelcontextprotocol/server-basic-vanillajs

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-basic-vanillajs@506

@modelcontextprotocol/server-basic-vue

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-basic-vue@506

@modelcontextprotocol/server-budget-allocator

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-budget-allocator@506

@modelcontextprotocol/server-cohort-heatmap

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-cohort-heatmap@506

@modelcontextprotocol/server-customer-segmentation

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-customer-segmentation@506

@modelcontextprotocol/server-debug

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-debug@506

@modelcontextprotocol/server-map

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-map@506

@modelcontextprotocol/server-pdf

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-pdf@506

@modelcontextprotocol/server-scenario-modeler

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-scenario-modeler@506

@modelcontextprotocol/server-shadertoy

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-shadertoy@506

@modelcontextprotocol/server-sheet-music

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-sheet-music@506

@modelcontextprotocol/server-system-monitor

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-system-monitor@506

@modelcontextprotocol/server-threejs

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-threejs@506

@modelcontextprotocol/server-transcript

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-transcript@506

@modelcontextprotocol/server-video-resource

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-video-resource@506

@modelcontextprotocol/server-wiki-explorer

npm i https://pkg.pr.new/modelcontextprotocol/ext-apps/@modelcontextprotocol/server-wiki-explorer@506

commit: c0be514

ochafik and others added 11 commits February 26, 2026 06:11
Add PDF annotation system with 7 annotation types (highlight, underline,
strikethrough, note, rectangle, freetext, stamp), text-based highlighting,
form filling, and annotated PDF download using pdf-lib.

- Server: annotation Zod schemas, extended interact tool with add/update/remove
  annotations, highlight_text, and fill_form actions
- Client: annotation layer rendering with PDF coordinate conversion, persistence
  via localStorage (using toolInfo.id key), pdf-lib-based download with embedded
  annotations and form fills, uses app.downloadFile() SDK with <a> fallback
- Model context includes annotation summary

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New tool `get_pages` lets the model get text and/or screenshots from
arbitrary page ranges without navigating the visible viewer.

- Server: `get_pages` tool with interval-based page ranges (optional
  start/end, open ranges supported), `getText`/`getScreenshots` flags,
  request-response bridge via `submit_page_data` app-only tool
- Client: offscreen rendering (hidden canvas, no visual interference),
  text from cache or on-demand extraction, screenshots scaled to 768px
  max dimension, results submitted back to server
- Max 20 pages per request, 60s timeout

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fold get_pages into the interact tool to minimize tools requiring
approval. Now accessed via `interact(action: "get_pages", ...)`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add concrete per-type schema docs with field names in tool description
- Add JSON example showing add_annotations with highlight + stamp
- Replace opaque z.record(z.string(), z.unknown()) with typed union
  of all annotation schemas (full + partial forms) so the model sees
  exact field names and types
- Remove redundant manual safeParse since Zod inputSchema validates

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- display_pdf result text now explicitly lists annotation capabilities
  (highlights, stamps, notes, etc.) instead of vague "navigate, search, zoom, etc."
- Restructured interact tool description: annotations promoted to top,
  with clear type reference, JSON example, and bold section headers
- Added pdf-annotations.spec.ts with 6 E2E tests covering:
  - Result text mentions annotation capabilities
  - interact tool available in dropdown
  - add_annotations renders highlight
  - Multiple annotation types render (highlight, note, stamp, freetext, rectangle)
  - remove_annotations removes from DOM
  - highlight_text finds and highlights text

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that Claude can discover and use PDF annotation capabilities
by calling the Anthropic Messages API with the tool schemas and
simulated display_pdf result.

Disabled by default — skipped unless ANTHROPIC_API_KEY is set:
  ANTHROPIC_API_KEY=sk-... npx playwright test tests/e2e/pdf-annotations-api.spec.ts

3 scenarios tested:
- Model uses highlight_text when asked to highlight the title
- Model discovers annotation capabilities when asked "can you annotate?"
- Model uses interact (add_annotations or get_pages) when asked to add notes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e to README

- Example prompts for annotations, navigation, page extraction, stamps, forms
- Documents how to run E2E tests and API prompt discovery tests
- Updated tools table to include interact tool
- Updated key patterns table with annotations, command queue, file download
- Added pdf-lib to dependencies list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The typed Zod union (14 anyOf variants: 7 full + 7 partial annotation
types) produced a 5,817-char JSON schema for the annotations field alone.
This bloated the interact tool schema to 7,802 chars, which may cause
the model to struggle with or skip the tool.

Replace with z.record(z.string(), z.any()) — annotation types are
already fully documented in the tool description. Schema drops to
2,239 chars (71% reduction), annotations field to 254 chars (96% reduction).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The display_pdf result text now lists every action by name (navigate,
search, find, search_navigate, zoom, add_annotations, update_annotations,
remove_annotations, highlight_text, fill_form, get_pages) so the model
knows exactly what commands are available without needing to inspect the
interact tool schema.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The model was passing "pdf-viewer" instead of the actual UUID, causing
get_pages to timeout (commands queued under wrong key, client never
picks them up).

- Add activeViewUUIDs set tracking UUIDs issued by display_pdf
- Validate viewUUID at the top of interact handler with clear error
- Add "IMPORTANT: viewUUID must be the exact UUID returned by display_pdf"
  to the interact tool description

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ochafik ochafik changed the title pdf-server: add interact tool with command queue pdf-server: annotations, interact tool, page extraction & prompt engineering Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant