fix: stage media bytes across turns and coalesce approval prompts#950
Merged
fix: stage media bytes across turns and coalesce approval prompts#950
Conversation
…ompts After #948 moved file persistence to the agent loop for the ask/deny permission paths, upload_to_storage could fail with "no file content available" whenever the agent gathered conversational context before calling the tool -- because pending_media was scoped to the current inbound message, and any prior turn's bytes were gone by the time the tool actually fired. Separately, chained or retried ASK tool calls prompted the user multiple times for what was semantically one intent (save this photo for David Graham), even though #943 told the agent not to pre-check conversationally. - Add backend/app/agent/media_staging.py: in-memory, per-user, TTL-bounded cache keyed by original_url. Populated at download time regardless of permission level; evicted on successful upload / organize / auto-save. - _file_factory merges staged bytes into pending_media so upload_to_storage works on turns that have no attachments. - Add resource_extractor=client_name to upload_to_storage and organize_file so the approval system can key by intent. - core.py: per-run _approval_cache keyed by (tool_name, resource). Repeat ASK calls with the same resource reuse the first APPROVED decision instead of prompting again. - instructions.md File uploads: forbid conversational pre-checks explicitly; tell the agent to ask one question OR call the tool, not both. - tests/test_media_staging.py: 10 tests covering staging lifecycle, cross-turn recovery, cache eviction, and approval coalescing (three chained fake_upload calls -> one prompt). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Review follow-ups on #950: - Drop the unused logger import from media_staging; nothing was being logged from this module. - Add get_mime_type() accessor and use it in upload_to_storage so the authoritative mime type from the download layer overrides whatever the LLM guessed in the tool arguments (prevents PDFs from being saved with .jpg extensions when the agent forgets to pass mime_type). - Refresh the upload_to_storage tool description so it reflects the staging cache -- "only works with media in the current message" was no longer accurate. - Regression tests for get_mime_type and the staged-mime-over-argument precedence. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Two connected bugs observed in a live BlueBubbles conversation where a photo was sent, the agent clarified "which client?", then asked for upload approval several turns later.
upload_to_storagepermission levelsask/deny, the agent depends onpending_mediato carry byte content. Butpending_mediais scoped to the current inbound message; if the agent takes a clarifying turn first, the bytes are garbage-collected and the tool fails with"No file content available to upload".upload_to_storage→ fails → retry) produced 2-3 prompts for one user intent.Type
Changes
backend/app/agent/media_staging.py(new) -- in-memory, per-user, TTL-bounded (1h) cache keyed byoriginal_url. Purges on access.router.py--prepare_mediaandprepare_media_stepstash downloaded bytes into staging regardless of permission level. Guarded withif ctx.user is not Noneto respect existing test fixtures.file_tools.py_file_factorymerges staged bytes intopending_mediasoupload_to_storageworks on turns that have no attachments of their own.upload_to_storage,organize_file, andauto_save_mediaevict after success.resource_extractor = client_nameso the approval system can key by intent.core.py-- per-run_approval_cachekeyed by(tool_name, resource). Repeat ASK calls with the same resource reuse the first APPROVED decision instead of re-prompting. ALWAYS_ALLOW is still persisted to PERMISSIONS.json and bypasses the cache naturally.instructions.mdFile uploads section -- explicitly forbids conversational pre-checks; tells the agent to ask one clarifying question OR call the tool, not both.Tests
tests/test_media_staging.py(10 tests):_file_factorycross-turn recovery: staged bytes appear inpending_mediawhen current turn has no attachments.fake_upload(client_name="David Graham")calls in one agent run produce onegate.request_approvalcall, not three.Checklist
uv run pytest)ruff check backend/ tests/+ruff format --check)ty check)AI Usage
🤖 Generated with Claude Code