Skip to content

Add move_document agent tool for intra-corpus folder moves#1196

Open
JSv4 wants to merge 12 commits intomainfrom
claude/document-move-tool-fKiJU
Open

Add move_document agent tool for intra-corpus folder moves#1196
JSv4 wants to merge 12 commits intomainfrom
claude/document-move-tool-fKiJU

Conversation

@JSv4
Copy link
Copy Markdown
Collaborator

@JSv4 JSv4 commented Apr 4, 2026

Summary

  • Adds move_document / amove_document agent tool for moving documents between folders within the same corpus
  • Delegates to DocumentFolderService.move_document_to_folder for the actual move, inheriting its permission checks
  • Pass target_folder_id to specify destination folder, or omit for corpus root
  • Uses visible_to_user() for document and corpus lookups to prevent IDOR enumeration

Design decisions

  • corpus_id and author_id are auto-injected from agent context
  • target_folder_id is LLM-provided (optional, defaults to corpus root)
  • document_id is context-dependent: auto-injected in document agents, LLM-provided in corpus agents
  • requires_approval=True since folder reorganization is a structural change
  • Categorized as ToolCategory.CORPUS because this manipulates corpus folder hierarchy, not document content

Test plan

  • Unit tests for move to folder, move to root, nonexistent document/folder, wrong-corpus folder, non-member document
  • Unit test for write permission denied (read-only user cannot move documents)

claude added 2 commits April 4, 2026 17:39
Implements a new agent tool that moves a document from one corpus to
another by creating an isolated copy in the target corpus and
soft-deleting from the source. Delegates to DocumentFolderService for
both operations, inheriting its permission checks (write on both
corpuses required).

Key design decisions:
- corpus_id (source) and author_id are auto-injected from agent context
- target_corpus_id is LLM-provided since the agent needs to choose it
- document_id is context-dependent: auto-injected in document agents,
  LLM-provided in corpus agents
- requires_approval=True since removal from source is destructive
- Validates folder belongs to target corpus before proceeding

https://claude.ai/code/session_01Mp2Y7i6fSwfiioAwU6TmFE
Replaces the cross-corpus move_document_to_corpus tool with
move_document, which moves a document between folders within the same
corpus using DocumentFolderService.move_document_to_folder.

The tool accepts document_id and corpus_id (auto-injected from agent
context), author_id (auto-injected from user context), and an optional
target_folder_id (LLM-provided; None moves to corpus root). Requires
corpus write permission; requires_approval=True since it changes
document organization.

https://claude.ai/code/session_01Mp2Y7i6fSwfiioAwU6TmFE
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

PR Review: move_document Agent Tool

Overall this is a clean, well-structured addition that follows existing patterns correctly. A few items worth addressing before merge:


Bug / Discrepancy: PR Title and Description Don't Match the Code

The PR title says "Add move_document_to_corpus agent tool" and the description says "creates an isolated copy in the target corpus and soft-deleting from the source" — but the code actually implements an intra-corpus folder-move (moving a document between folders within the same corpus). The title and description appear to describe a cross-corpus migration tool that was never implemented.

Please update the PR title and description to accurately describe what this does: "Move a document between folders within the current corpus".


Category: ToolCategory.CORPUS vs ToolCategory.DOCUMENT

The tool is categorized as CORPUS, but it operates on a single document (folder-placement is a document-level property). All existing single-document write tools (update_document_summary, update_document_description, etc.) use ToolCategory.DOCUMENT. This tool accepts document_id as an LLM-provided parameter and operates on that document — more consistent to use ToolCategory.DOCUMENT.

That said, if the intent is that this tool is only available to corpus agents (where the LLM picks which document to move), CORPUS is a defensible choice. Either way, a brief comment explaining the decision would help future contributors.


Missing CHANGELOG.md Update

Per CLAUDE.md, significant changes must be documented in CHANGELOG.md. A new agent tool with a new test file qualifies.


Test Concern: test_move_no_write_permission_raises conflates two failure modes

self.corpus_doc lives in self.corpus, not private_corpus. The service fails at the write-permission check first (before the corpus-membership check), so the test passes — but it conflates two different failure reasons. The separate test_move_document_not_in_corpus_raises test already covers the membership case. Consider a test where the user genuinely has write on a corpus that contains the document but some other permission boundary is being tested, or at minimum update the docstring to note that the failure here is also "document not in that corpus".


Minor: Adjacent f-strings

In core_tools.py, the return dict has two adjacent f-strings that can be collapsed:

# Before
"message": (
    f"Document {document_id} moved to {destination} " f"in corpus {corpus_id}."
),

# After
"message": f"Document {document_id} moved to {destination} in corpus {corpus_id}.",

Looks Good

  • Async/sync split (move_document / amove_document via _db_sync_to_async) matches the existing pattern correctly.
  • Only amove_document is registered in the tool registry — consistent with the "sync exists for tests only" rule in CLAUDE.md.
  • Permission delegation to DocumentFolderService.move_document_to_folder is the right design; no permission logic leaks into the tool layer.
  • requires_approval=True is appropriate since the document is removed from its previous folder location.
  • Folder-belongs-to-corpus validation is handled by the service, keeping responsibilities clean.
  • Test coverage is thorough: 7 cases covering happy paths, not-found, wrong-corpus folder, permission denied, and document-not-in-corpus.

@JSv4 JSv4 changed the title Add move_document_to_corpus agent tool Add move_document agent tool for intra-corpus folder moves Apr 4, 2026
- Fix CorpusFolder.objects.create calls: use `name=` (not `title=`)
  to match model field after main merge
- Fix target_folder.title -> target_folder.name in success message
- Collapse adjacent f-strings into single f-string
- Add comment explaining ToolCategory.CORPUS choice
- Rename test_move_no_write_permission_raises to accurately describe
  what it tests (membership check, not permission boundary)
@claude
Copy link
Copy Markdown

claude bot commented Apr 4, 2026

Code Review: move_document Agent Tool - Overall this is a clean, well-scoped addition. The delegation to DocumentFolderService.move_document_to_folder is the right call. Issues: (1) Missing CHANGELOG.md entry. (2) setUp calls set_permissions_for_obj_to_user on the document but the permission check in move_document_to_folder is corpus-level - tests pass because self.user is the corpus creator, making the document permission grant misleading. (3) No test for the guardian UPDATE permission path (non-creator with explicit permission). (4) test_move_to_corpus_where_doc_not_member_raises and test_move_document_not_in_corpus_raises cover distinct failure modes but docstrings do not make this clear. (5) Two adjacent f-strings in the return message can be merged. (6) author_id: int vs int | None = None diverges from convention - worth a comment.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

…anup

- Add CHANGELOG.md entry for move_document tool (PR #1196)
- Expand ToolCategory.CORPUS comment explaining corpus-agent-only scope
- Add comments explaining author_id: int type divergence from convention
- Fix unused variables in test setUp (use *_ unpacking)
- Add clarifying comment that test permission grant is document-level but
  the actual check is corpus-level (corpus creator status)
@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

PR Review: move_document agent tool

Overall this is a clean, well-structured addition that follows established patterns. A few things worth addressing before merge:


Potential Bug: self.corpus_doc identity in tests

In test_move_document_tool.py (line 44–46):

self.corpus_doc, *_ = self.corpus.add_document(
    document=self.original_doc,
    user=self.user,
    folder=self.folder_a,
)

The tests later use document_id=self.corpus_doc.id in the tool call and also DocumentPath.objects.get(document=self.corpus_doc, ...). For this to be correct, add_document must return a Document as its first value. If it returns a DocumentPath or a tuple where the first element isn't the Document, both the tool call and the path assertions would silently test the wrong thing (passing a DocumentPath.pk as document_id, which would fail to find the intended document).

Please verify what Corpus.add_document() returns and rename corpus_doc to document (or doc) to make it unambiguous that it's a Document instance.


Missing test: permission denied

The test suite covers the happy path, nonexistent entities, wrong-corpus folder, and non-membership, but there's no test for the case where the user lacks write permission on the corpus. Given requires_write_permission=True, verifying that DocumentFolderService.move_document_to_folder rejects an unauthorized user would close the security boundary:

def test_move_without_write_permission_raises(self):
    """A user without corpus write permission cannot move documents."""
    with self.assertRaises(ValueError) as ctx:
        move_document(
            document_id=self.corpus_doc.id,
            corpus_id=self.corpus.id,
            author_id=self.other_user.id,  # other_user has no corpus perms
            target_folder_id=self.folder_b.id,
        )
    self.assertIn("Move failed", str(ctx.exception))

Minor security note: document existence probing

core_tools.py (new move_document):

try:
    document = Document.objects.get(pk=document_id)
except Document.DoesNotExist:
    raise ValueError(f"Document with id={document_id} does not exist.")

This fetches by pk alone, without filtering to visible_to_user(user) first. The downstream DocumentFolderService.check_document_in_corpus prevents an unauthorized move, but the error message ("Document with id=X does not exist" vs "Move failed: ...") leaks whether the document exists at all — which conflicts with the IDOR prevention guidance in CLAUDE.md ("Return same error message whether object doesn't exist or belongs to another user").

Consider either:

  • Using Document.objects.visible_to_user(user).get(pk=document_id) for the pre-check, or
  • Letting the service handle all validation and not doing a separate pre-fetch in the tool

Nit: misleading PR description comment about document agents

The PR description says:

document_id is context-dependent: auto-injected in document agents, LLM-provided in corpus agents

But since the tool is ToolCategory.CORPUS, it isn't available to document-scoped agents at all — so the "auto-injected in document agents" case never fires. The comment in tool_registry.py already explains the design rationale correctly; the PR description note is just slightly misleading.


What's done well

  • Follows the sync+async (move_document / amove_document) pattern correctly, including _db_sync_to_async.
  • Properly delegates all permission enforcement to DocumentFolderService.move_document_to_folder rather than reimplementing it.
  • requires_approval=True is exactly right for a structural/irreversible-feeling change.
  • All imports deferred inside the function body to avoid circular imports.
  • ToolCategory.CORPUS assignment is well-reasoned and documented.
  • CHANGELOG entry is present and detailed.
  • Alphabetical ordering in _populate is preserved.

- Use visible_to_user() for document and corpus lookups to prevent
  enumeration of object IDs by unauthorized users. Error messages
  now say "does not exist or is not accessible" regardless of whether
  the object exists but is invisible or truly missing.
- Add test_move_no_write_permission_raises: grants other_user read-only
  access to corpus and document, then asserts the move is rejected at
  the service layer with "Permission denied".
- Update existing test assertions and docstrings to reflect the new
  IDOR-safe error messages.
- Update CHANGELOG entry to note the IDOR prevention.
@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

Code Review

Overall this is clean, well-structured work. The design choices (delegating to DocumentFolderService, requires_approval=True, injecting corpus_id/author_id from context) all look right. A few issues worth addressing before merge:


Security — Folder IDOR Enumeration (core_tools.py)

The folder lookup uses a direct CorpusFolder.objects.get() with no user-visibility scope:

target_folder = CorpusFolder.objects.get(pk=target_folder_id)

This leaks whether a folder ID exists in a private corpus via error message differentiation:

  • Folder exists (wrong corpus): "Move failed: Target folder does not belong to this corpus"
  • Folder doesn't exist: "Folder with id=X does not exist"

DocumentFolderService.get_folder_by_id(user, folder_id) already exists for exactly this case — it returns None for both not-found and permission-denied, preventing enumeration:

# Suggested fix
target_folder = None
if target_folder_id is not None:
    target_folder = DocumentFolderService.get_folder_by_id(user, target_folder_id)
    if target_folder is None:
        raise ValueError(
            f"Folder with id={target_folder_id} does not exist or is not accessible."
        )

This mirrors how the corpus/document lookups already handle IDOR prevention. The corresponding test test_move_folder_wrong_corpus_raises assertion ("Move failed") would need to be updated to reflect the earlier error.


Minor — Document Lookup Scope

The document lookup uses Document.objects.visible_to_user(user).get(pk=document_id) which is correct for visibility. However, the check that the document actually belongs to corpus happens downstream in DocumentFolderService.move_document_to_folder. If a user can see a document from another corpus, they'll get a "Move failed: Document does not belong to this corpus" error rather than a "does not exist or is not accessible" error, which is accurate but slightly inconsistent with the IDOR-prevention pattern used elsewhere. Low severity since the move doesn't succeed, but worth noting.


Test Coverage — Good

The test matrix covers all the main paths:

  • Move to folder, move to root
  • Nonexistent document/folder
  • Wrong-corpus folder
  • Document not in corpus
  • Write permission denied

One gap: there's no test for amove_document (the async wrapper). Since it delegates entirely to move_document via _db_sync_to_async, that's a reasonable omission — just calling out that it's intentional.


Minor — Registry Placement

amove_document is inserted alphabetically in the all-tools list (aget_page_image, aload_document_md_summary, ..., amove_document, asearch_document_notes) — this is correct. The _sync_tool_map entry "move_document": (amove_document, ()) is also correct.


Summary

Issue Severity File Line
Folder IDOR enumeration (use get_folder_by_id) Medium core_tools.py ~2753
Document-not-in-corpus message slightly inconsistent with IDOR pattern Low core_tools.py ~2748

The IDOR folder issue is the only substantive change needed. Everything else looks solid.

Replace CorpusFolder.objects.get() with DocumentFolderService.get_folder_by_id()
which returns None for both not-found and permission-denied cases, preventing
folder existence enumeration. Add test_move_inaccessible_folder_raises to verify
that folders in other users' private corpuses produce the same error as
nonexistent folders.
@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

PR Review: move_document agent tool

The current implementation is clean and well-structured. Several issues raised in earlier reviews are already addressed in the diff (IDOR-safe folder lookup via get_folder_by_id, visible_to_user on document/corpus, CHANGELOG entry). A few remaining items:


Codecov: 3 uncovered lines

The most likely uncovered path is the User.DoesNotExist branch (no test passes an invalid author_id). Since author_id is always injected from agent context this is low-risk in production, but worth a quick test:

def test_invalid_author_id_raises(self):
    with self.assertRaises(ValueError) as ctx:
        move_document(
            document_id=self.corpus_doc.id,
            corpus_id=self.corpus.id,
            author_id=999999,
            target_folder_id=None,
        )
    self.assertIn("does not exist", str(ctx.exception))

Minor: self.corpus_doc naming

Corpus.add_document() returns (new_corpus_isolated_document, status, document_path), so self.corpus_doc is the corpus-isolated Document instance — the name implies it might be a path or membership record. Renaming to self.doc or similar would clarify intent. The test logic itself is correct.


What's done right

  • amove_document registered in the tool registry; sync move_document kept for tests only — matches CLAUDE.md.
  • DocumentFolderService.get_folder_by_id(user, folder_id) used for folder resolution (returns None for both not-found and inaccessible) — IDOR prevention correct.
  • All permission enforcement delegated to DocumentFolderService.move_document_to_folder; no permission logic leaks into the tool layer.
  • requires_approval=True is right for a structural change.
  • Test assertions match actual service error messages (verified against folder_service.py:879-890).
  • 7-case test matrix covers all meaningful code paths.

The coverage gap is the only substantive item left.

Address review comments: add test_invalid_author_id_raises covering the
User.DoesNotExist branch, and rename self.corpus_doc to self.doc for
clarity since it refers to the corpus-isolated Document instance.
@claude
Copy link
Copy Markdown

claude bot commented Apr 5, 2026

PR Review: Add move_document agent tool

Overall this is a clean, well-structured addition that fits naturally into the existing tool architecture. A few observations below.

Strengths

  • Pattern consistency: Sync + async pair via _db_sync_to_async, local imports inside the sync function, visible_to_user() scoping — all match existing tool conventions exactly.
  • IDOR prevention: Both corpus and document are fetched through visible_to_user(), and get_folder_by_id returns None for both not-found and permission-denied cases. Error messages are deliberately generic.
  • Correct fault surface: requires_approval=True and requires_write_permission=True are both set. Permission enforcement is correctly delegated to DocumentFolderService.move_document_to_folder which gates on corpus write access.
  • Test coverage: All the important edge cases are covered — move to folder, move to root, nonexistent IDs, wrong-corpus folder, inaccessible folder, non-member document, invalid author, and read-only user. The tests use the sync function directly, which is the right approach for Django TestCase.

Minor Issues

1. Subtle error-message inconsistency (folder cross-corpus)

DocumentFolderService.get_folder_by_id checks read permission on the folder's corpus. So if a caller passes a target_folder_id that lives in a different corpus they happen to have read access to, get_folder_by_id returns the folder successfully, and the rejection only happens later inside move_document_to_folder — producing "Move failed: Target folder does not belong to this corpus".

This means the error message leaks whether the caller has read access to the corpus that owns the folder: inaccessible folder → "does not exist or is not accessible", accessible-but-wrong-corpus folder → "Move failed: …". Since the caller must already have read access to know that folder ID exists, this is a very low-risk information leak, but worth noting.

A simple fix would be to also validate that the retrieved folder belongs to the target corpus before delegating to the service:

target_folder = DocumentFolderService.get_folder_by_id(user, target_folder_id)
if target_folder is None:
    raise ValueError(
        f"Folder with id={target_folder_id} does not exist or is not accessible."
    )
# Early check to keep error surface consistent
if target_folder.corpus_id != corpus.id:
    raise ValueError(
        f"Folder with id={target_folder_id} does not exist or is not accessible."
    )

2. No async test

The test file tests only move_document (sync). The amove_document wrapper is trivial (_db_sync_to_async), but a minimal smoke test (via async_to_sync) would give confidence that the async plumbing and registry entry work end-to-end.

3. ToolCategory.DOCUMENT agents cannot move documents

The decision to categorize this under ToolCategory.CORPUS is well-documented. Just worth confirming that document-scoped agents moving their own document (e.g. "move the document I'm currently chatting about to folder X") is intentionally out of scope — if it's a valid future use case, adding ToolCategory.DOCUMENT support now (with document_id auto-injected from context) would be cheaper than a second PR.

Nits

  • The docstring in amove_document is a copy of the one in move_document. The async version could briefly note "Async wrapper around move_document" to save future readers from diffing them.
  • corpus_id and author_id are injected from agent context — this implicit contract could use a brief comment in _populate next to (amove_document, ()), consistent with how other injected-param tools are handled.

Summary

No blockers. The IDOR note (#1) is worth a decision but not blocking since the information leak is negligible given the read-access prerequisite. Tests are solid, changelog entry is clear and correctly placed.

JSv4 added 2 commits April 5, 2026 21:04
Add an early check in move_document that the retrieved target folder
belongs to the same corpus, ensuring a user with read access to another
corpus's folder gets the same "does not exist or is not accessible"
error as one without access (IDOR prevention). Also simplify the
amove_document docstring to note it is an async wrapper.
@claude
Copy link
Copy Markdown

claude bot commented Apr 6, 2026

Overall this is a clean, well-structured addition. The IDOR prevention, delegation to DocumentFolderService, and permission model are all handled correctly. A few items to consider:

Clarity: test_move_document_not_in_corpus_raises tests the wrong scenario description

The test logic is sound, but the docstring says the failure comes from "a document that isn't in the corpus" when more precisely the document IS accessible to the user — it just has no DocumentPath in other_corpus. The Document.objects.visible_to_user(user).get(pk=document_id) lookup succeeds; it's DocumentFolderService.move_document_to_folder's check_document_in_corpus that fails. Minor wording nit.

Redundant cross-corpus folder validation

In move_document (core_tools.py), there's an explicit check: target_folder.corpus_id != corpus.id. But DocumentFolderService.move_document_to_folder also checks folder.corpus_id != corpus.id internally. The tool-level check is fine as an early-exit guard, but a brief comment noting the service also validates this would prevent future maintainers from removing one check thinking the other covers it.

No test for the async entry point

Tests only invoke move_document (sync). In production, amove_document is the one registered in FUNCTION_MAP and actually called. Since it's a thin _db_sync_to_async wrapper the risk is low, but a single async smoke test (e.g. using async_to_sync or asyncio.run) would confirm the thread-pool wrapping works as expected.

get_folder_by_id permission gate is read-level only

When resolving target_folder, DocumentFolderService.get_folder_by_id checks corpus read permission only. A read-only user successfully looks up the folder here — the write-permission gate is deferred to move_document_to_folder. This is architecturally correct (defense-in-depth: the service always enforces write permission), but a comment clarifying this intentional design would make the security model explicit.

Nit: Confirm corpus agents can discover document_id values

move_document requires the LLM to provide document_id as a required parameter, which is unusual for ToolCategory.CORPUS tools. Worth confirming that corpus agents have access to a document-listing or search tool so the LLM can retrieve valid document_id values before invoking this one — otherwise it would be impractical to use.

Changelog

The CHANGELOG.md entry is present and well-written. No action needed.

Summary: No blocking issues. Permission model is solid, IDOR prevention is correct, and test coverage is thorough for the sync path. Most actionable items: (1) add a brief async smoke test for amove_document, (2) add a comment clarifying the intentional redundancy in the cross-corpus folder check.

JSv4 added 2 commits April 7, 2026 02:22
- Add async smoke test for amove_document using TransactionTestCase
  with proper signal disconnection to prevent celery task failures
- Add explanatory comment on the intentional redundancy between the
  tool-level cross-corpus folder check and the service-level check,
  clarifying the IDOR-prevention rationale
@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

PR Review: move_document agent tool (current diff)

Previous reviews raised a number of issues; most have been resolved in this revision.

What has been addressed

  • IDOR-safe folder lookup: DocumentFolderService.get_folder_by_id(user, target_folder_id) plus early cross-corpus check ensures inaccessible and wrong-corpus folders produce the same generic error message
  • CHANGELOG.md: Present and detailed
  • Missing tests: test_invalid_author_id_raises and TestMoveDocumentAsync have been added
  • Variable naming: self.corpus_doc renamed to self.doc
  • requires_approval=True: Correct for a structural change
  • Permission delegation: All enforcement delegated to DocumentFolderService.move_document_to_folder; no permission logic leaks into the tool layer

Remaining observation: Codecov 96.42% gap likely resolved

The Codecov bot flagged 1 uncovered line (96.42%) in an earlier run. The most likely missing branch was the User.DoesNotExist path (lines 2736-2737), which test_invalid_author_id_raises now covers. The async path is covered by TestMoveDocumentAsync. Codecov should pass on a re-run.

Minor: move_document_to_folder folder-corpus check is unreachable from this tool

At folder_service.py:889, the check if folder is not None and folder.corpus_id != corpus.id can never be triggered when called from move_document, because the tool's early guard at line 2762 raises before delegating to the service whenever the corpus does not match. This is fine as defense-in-depth (the service is called from other paths too), but worth noting so future maintainers do not wonder why service-level coverage for that branch is hard to achieve through this tool's tests alone.

Minor: TestMoveDocumentAsync.setUp signal disconnect placement

post_save.disconnect(process_doc_on_create_atomic, sender=Document, dispatch_uid=DOC_CREATE_UID)
try:
    ...
finally:
    post_save.connect(...)

The disconnect is outside the try block. If it were to raise (practically impossible, but possible if the signal is not connected), finally would not run. Moving it inside try makes the safety guarantee explicit. Low severity, not blocking.

Summary

No blockers. Implementation is correct, IDOR prevention is solid, and test coverage is thorough. The two observations above are minor cleanup items. Ready to merge once Codecov confirms the coverage gap is closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants