pdf-server: annotations, interact tool, page extraction & prompt engineering#506
Open
pdf-server: annotations, interact tool, page extraction & prompt engineering#506
Conversation
@modelcontextprotocol/ext-apps
@modelcontextprotocol/server-basic-preact
@modelcontextprotocol/server-basic-react
@modelcontextprotocol/server-basic-solid
@modelcontextprotocol/server-basic-svelte
@modelcontextprotocol/server-basic-vanillajs
@modelcontextprotocol/server-basic-vue
@modelcontextprotocol/server-budget-allocator
@modelcontextprotocol/server-cohort-heatmap
@modelcontextprotocol/server-customer-segmentation
@modelcontextprotocol/server-debug
@modelcontextprotocol/server-map
@modelcontextprotocol/server-pdf
@modelcontextprotocol/server-scenario-modeler
@modelcontextprotocol/server-shadertoy
@modelcontextprotocol/server-sheet-music
@modelcontextprotocol/server-system-monitor
@modelcontextprotocol/server-threejs
@modelcontextprotocol/server-transcript
@modelcontextprotocol/server-video-resource
@modelcontextprotocol/server-wiki-explorer
commit: |
Add PDF annotation system with 7 annotation types (highlight, underline, strikethrough, note, rectangle, freetext, stamp), text-based highlighting, form filling, and annotated PDF download using pdf-lib. - Server: annotation Zod schemas, extended interact tool with add/update/remove annotations, highlight_text, and fill_form actions - Client: annotation layer rendering with PDF coordinate conversion, persistence via localStorage (using toolInfo.id key), pdf-lib-based download with embedded annotations and form fills, uses app.downloadFile() SDK with <a> fallback - Model context includes annotation summary Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New tool `get_pages` lets the model get text and/or screenshots from arbitrary page ranges without navigating the visible viewer. - Server: `get_pages` tool with interval-based page ranges (optional start/end, open ranges supported), `getText`/`getScreenshots` flags, request-response bridge via `submit_page_data` app-only tool - Client: offscreen rendering (hidden canvas, no visual interference), text from cache or on-demand extraction, screenshots scaled to 768px max dimension, results submitted back to server - Max 20 pages per request, 60s timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fold get_pages into the interact tool to minimize tools requiring approval. Now accessed via `interact(action: "get_pages", ...)`. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add concrete per-type schema docs with field names in tool description - Add JSON example showing add_annotations with highlight + stamp - Replace opaque z.record(z.string(), z.unknown()) with typed union of all annotation schemas (full + partial forms) so the model sees exact field names and types - Remove redundant manual safeParse since Zod inputSchema validates Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- display_pdf result text now explicitly lists annotation capabilities (highlights, stamps, notes, etc.) instead of vague "navigate, search, zoom, etc." - Restructured interact tool description: annotations promoted to top, with clear type reference, JSON example, and bold section headers - Added pdf-annotations.spec.ts with 6 E2E tests covering: - Result text mentions annotation capabilities - interact tool available in dropdown - add_annotations renders highlight - Multiple annotation types render (highlight, note, stamp, freetext, rectangle) - remove_annotations removes from DOM - highlight_text finds and highlights text Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tests that Claude can discover and use PDF annotation capabilities by calling the Anthropic Messages API with the tool schemas and simulated display_pdf result. Disabled by default — skipped unless ANTHROPIC_API_KEY is set: ANTHROPIC_API_KEY=sk-... npx playwright test tests/e2e/pdf-annotations-api.spec.ts 3 scenarios tested: - Model uses highlight_text when asked to highlight the title - Model discovers annotation capabilities when asked "can you annotate?" - Model uses interact (add_annotations or get_pages) when asked to add notes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…e to README - Example prompts for annotations, navigation, page extraction, stamps, forms - Documents how to run E2E tests and API prompt discovery tests - Updated tools table to include interact tool - Updated key patterns table with annotations, command queue, file download - Added pdf-lib to dependencies list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The typed Zod union (14 anyOf variants: 7 full + 7 partial annotation types) produced a 5,817-char JSON schema for the annotations field alone. This bloated the interact tool schema to 7,802 chars, which may cause the model to struggle with or skip the tool. Replace with z.record(z.string(), z.any()) — annotation types are already fully documented in the tool description. Schema drops to 2,239 chars (71% reduction), annotations field to 254 chars (96% reduction). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The display_pdf result text now lists every action by name (navigate, search, find, search_navigate, zoom, add_annotations, update_annotations, remove_annotations, highlight_text, fill_form, get_pages) so the model knows exactly what commands are available without needing to inspect the interact tool schema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The model was passing "pdf-viewer" instead of the actual UUID, causing get_pages to timeout (commands queued under wrong key, client never picks them up). - Add activeViewUUIDs set tracking UUIDs issued by display_pdf - Validate viewUUID at the top of interact handler with clear error - Add "IMPORTANT: viewUUID must be the exact UUID returned by display_pdf" to the interact tool description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds full annotation, interaction, and page extraction capabilities to the PDF server:
navigate,search,find,search_navigate,zoomadd_annotations(7 types: highlight, underline, strikethrough, note, rectangle, freetext, stamp),update_annotations,remove_annotationshighlight_text— auto-find and highlight text by queryget_pages— batch text and/or screenshot extraction from page ranges without visual navigation (offscreen rendering)fill_form— fill PDF form fieldspdf-lib(client-side) +app.downloadFile()SDK supporttoolInfo.idNew dependency
pdf-lib(^1.17.1) — client-side PDF modification for annotated downloadFiles changed
examples/pdf-server/server.tsexamples/pdf-server/src/mcp-app.tsexamples/pdf-server/mcp-app.htmlexamples/pdf-server/src/mcp-app.cssexamples/pdf-server/README.mdtests/e2e/pdf-annotations.spec.tstests/e2e/pdf-annotations-api.spec.tsANTHROPIC_API_KEY)Test plan
npx playwright test tests/e2e/pdf-annotations.spec.ts— 6 tests pass (annotation CRUD, highlight_text)npx playwright test -g "PDF Server"— existing screenshot tests passANTHROPIC_API_KEY=... npx playwright test tests/e2e/pdf-annotations-api.spec.ts— 3/3 pass (model discovers annotations)npm run --workspace examples/pdf-server build— compiles cleanly🤖 Generated with Claude Code