Implement nested approval gates for sub-agent tool calls#881
Conversation
When a corpus agent delegates to a document sub-agent via ask_document, the sub-agent may encounter tools that require user approval (e.g., update_document_summary). Previously, the ApprovalNeededEvent from the sub-agent was silently ignored, causing the tool to fail without user interaction. Changes: Backend (pydantic_ai_agents.py): - Detect ApprovalNeededEvent in ask_document_tool's sub-agent stream - Raise ToolConfirmationRequired with the sub-agent's tool details, preserving _sub_tool_name and _sub_tool_arguments for UI display - Re-raise ToolConfirmationRequired from CallToolsNode exception handler instead of swallowing it with the generic Exception catch - Add skip_approval parameter to ask_document_tool so resumed calls create sub-agents with skip_approval_gate=True - Strip internal metadata keys (prefixed with _) in resume_with_approval before passing args to the tool function Frontend (CorpusChat.tsx, ChatTray.tsx, useAgentChat.ts): - Add ASYNC_APPROVAL_RESULT and ASYNC_RESUME message types - Handle ASYNC_APPROVAL_RESULT to clear pending approval state - Handle ASYNC_RESUME to maintain processing indicator - For sub-agent approvals (ask_document), unwrap _sub_tool_name and _sub_tool_arguments so the approval modal shows the actual tool being approved rather than the coordinator's ask_document wrapper https://claude.ai/code/session_01KKkXwMVumAZTNqw3QFWBSH
Pull Request Review: Nested Approval Gates for Sub-Agent Tool CallsOverviewThis PR implements a sophisticated nested approval gate system allowing document-specific agents to propagate tool approval requests up to their parent corpus agent for user confirmation. The implementation touches both backend (pydantic_ai_agents.py) and frontend (React components) to create a seamless approval workflow. ✅ Strengths1. Clean Exception Propagation PatternThe addition of the 2. Metadata Stripping LogicThe metadata key filtering in tool_args = {k: v for k, v in tool_args.items() if not k.startswith("_")}This cleanly separates UI-only metadata ( 3. Frontend Unwrapping LogicThe frontend correctly unwraps nested tool calls to show users the actual sub-agent tool being approved rather than the generic 4. Consistent Implementation Across ComponentsThe approval handling logic is consistently applied across all three frontend components (CorpusChat, ChatTray, useAgentChat), which reduces the risk of edge cases. 🔍 Potential Issues1. Missing Test Coverage (CRITICAL)This PR adds 125 lines of code with zero test coverage. According to CLAUDE.md:
The nested approval gate functionality needs comprehensive tests covering:
Recommendation: Add a test file like Files: 2. Potential Race Condition in Frontend State UpdatesIn if (pendingApproval && data?.message_id === pendingApproval.messageId) {
setPendingApproval(null);
setShowApprovalModal(false);
}However, if the backend emits Files: 3. Inconsistent Approval Status HandlingIn if (data?.decision) {
updateMessageApprovalStatus(
pendingApproval.messageId,
data.decision as "approved" | "rejected"
);
}But in Files: 4. No Validation of
|
Security: Replace LLM-visible skip_approval parameter with config._approval_bypass_allowed runtime flag that only resume_with_approval() can set, wrapped in try/finally to guarantee reset. Prevents prompt injection bypass of approval gates. Frontend: Add consistent ASYNC_APPROVAL_RESULT handling across CorpusChat (was missing updateMessageApprovalStatus), add message approvalStatus tracking on ASYNC_APPROVAL_NEEDED, validate _sub_tool_name/_sub_tool_arguments types before unwrapping. Backend: Add defensive checks for malformed ApprovalNeededEvent (missing name, non-dict pending_tool_call, empty strings). Fix ASYNC_RESUME handler comment in ChatTray explaining why no-op. Tests: Add comprehensive test suite (test_nested_approval_gates.py) covering approval propagation, metadata stripping, bypass flag lifecycle, malformed event handling, and schema safety. Docs: Update CHANGELOG.md and docs/architecture/llms/README.md with nested approval gate flow diagrams and security notes. https://claude.ai/code/session_01KKkXwMVumAZTNqw3QFWBSH
Code Review: Nested Approval Gates ImplementationThis PR implements a sophisticated nested approval gate system for sub-agent tool calls. Overall, the implementation is well-designed and thoroughly tested. Below are my findings: ✅ Strengths1. Excellent Security Design
2. Comprehensive Test Coverage
3. Clear Documentation
4. Consistent Frontend Handling
🔍 Code Quality ObservationsMinor Issues (Non-blocking)1. Exception Handler Ordering (pydantic_ai_agents.py:767-784) except ToolConfirmationRequired:
# Sub-agent approval gates must propagate so the
# outer ToolConfirmationRequired handler can pause
# the conversation and surface it to the user.
raise
except Exception as tool_exc:
# Already handled by outer error handler – stop processing this nodeThe comment "Already handled by outer error handler" is misleading since we're IN an error handler. Consider: "Operational tool errors are caught and logged; execution continues to next node." 2. Metadata Key Stripping (pydantic_ai_agents.py:1117) tool_args = {k: v for k, v in tool_args.items() if not k.startswith("_")}However, this convention (_-prefix for internal metadata) should be documented in the agent architecture docs. Currently only mentioned in inline comments and CHANGELOG. 3. Type Safety in Frontend (CorpusChat.tsx:1206-1212) const subName = toolCall.arguments?._sub_tool_name;
if (typeof subName === "string" && subName.length > 0) {
toolCall.name = subName;
const subArgs = toolCall.arguments?._sub_tool_arguments;
toolCall.arguments =
subArgs && typeof subArgs === "object" ? subArgs : {};
}However, the 4. Diagnostic Logging (pydantic_ai_agents.py) 🚀 Performance Considerations
🔒 Security AssessmentExcellent
RecommendationConsider adding rate limiting or a maximum approval request count per conversation to prevent approval fatigue attacks (where an LLM repeatedly triggers approval requests to wear down the user). 📋 Test Execution RecommendationBefore merging, ensure:
📝 Changelog Compliance✅ CHANGELOG.md follows Keep a Changelog format Final VerdictAPPROVED ✅ This is a well-architected feature with excellent security considerations, comprehensive tests, and thorough documentation. The minor issues noted above are non-blocking and can be addressed in follow-up commits if desired. The nested approval gate system solves a real UX problem (surfacing sub-agent tool approvals to users) with a clean implementation that respects the project's architecture patterns. Suggested Follow-ups (Optional)
Great work! 🎉 |
Code Review: Nested Approval Gates for Sub-Agent Tool CallsSummaryThis PR implements a sophisticated nested approval gate system that allows document sub-agents to propagate tool approval requests up to their parent corpus agent. The implementation is well-thought-out with comprehensive testing and documentation. ✅ Strengths
|
Root causes: 1. Tests searched agent.config.tools for ask_document, but corpus- specific tools are only passed to PydanticAIAgent() constructor, not stored on config.tools. Fix: capture tools from mock_agent_cls.call_args after create(). 2. Tests patched pydantic_ai_agents._agents_api (module-level name that doesn't exist). The closure captures _agents_api as a local variable pointing to the AgentAPI singleton. Fix: patch AgentAPI.for_document on the actual class. https://claude.ai/code/session_01KKkXwMVumAZTNqw3QFWBSH
Pull Request Review: Nested Approval Gates for Sub-Agent Tool CallsSummaryThis PR implements a sophisticated nested approval gate system that allows document sub-agents to propagate tool approval requests up to their parent corpus agent for user confirmation. The implementation is well-architected with strong security considerations and comprehensive test coverage. ✅ Strengths1. Excellent Security Design
2. Robust Error Handling
3. Comprehensive Testing
4. Excellent Documentation
5. Consistent Frontend Implementation
🔍 Areas for Improvement1. Minor Code Quality IssuesIssue: In except ToolConfirmationRequired:
# Sub-agent approval gates must propagate so the
# outer ToolConfirmationRequired handler can pause
# the conversation and surface it to the user.
raiseSuggestion: This is actually fine as-is. The comment clearly explains why it's being re-raised. No change needed. 2. Potential Type Safety EnhancementIssue: The Suggestion: Consider adding this as an optional field to class AgentConfig:
_approval_bypass_allowed: bool = False # Internal flag, not exposed to LLMThis would eliminate the need for type ignores and make the design more explicit. 3. Frontend Validation Edge CaseIssue: In CorpusChat.tsx:1036-1042, the validation checks if (toolCall.name === 'ask_document') {
const subName = toolCall.arguments?._sub_tool_name;
if (typeof subName === 'string' && subName.length > 0) {
// ...
}
}Observation: Actually, the code DOES check 4. Test Coverage - Edge CaseObservation: The tests cover malformed events (missing name, empty name, non-dict), but don't test the case where Suggestion: Add a test case: async def test_malformed_sub_tool_arguments(self):
"""Verify that malformed _sub_tool_arguments in approval event
doesn't crash the system."""
events = [
_FakeApprovalEvent(
pending_tool_call={
'name': 'update_document_summary',
'arguments': 'not-a-dict', # Malformed arguments
'tool_call_id': 'tc-bad-args',
}
),
# Should continue processing
_FakeContentEvent(content='Recovered from bad args.'),
_FakeFinalEvent(content='Done.', sources=[], metadata={}),
]
# Test that it completes without crashingThis is a minor suggestion - the current implementation should handle this gracefully already. 🎯 Performance Considerations
🔒 Security AssessmentVerdict: EXCELLENT ✅ This PR demonstrates strong security awareness:
📊 Test Coverage Assessment10 comprehensive tests covering:
Coverage: Excellent. The test suite covers all critical paths including error cases. 📝 Documentation QualityVerdict: OUTSTANDING ✅
✅ RecommendationsCritical (None)No blocking issues identified. High Priority (Optional Enhancements)
Low Priority (Nice to Have)
🎉 Overall AssessmentAPPROVED ✅ This is a high-quality PR that demonstrates:
The nested approval gate system is well-architected and solves a real UX problem (surfacing sub-agent approvals to users). The security-first approach with the bypass flag is particularly commendable. Approval Contingent On:
Excellent work! 🚀 Next Steps
cc: @JSv4 (if applicable for review approval) |
… field name _check_user_permissions used user_has_permission_for_obj which misses creator-based access on corpus-isolated document copies. Switched to visible_to_user() queryset which correctly handles creator, public, and guardian permissions. Also fixed test usage of nonexistent chat_type field to use conversation_type.
Pull Request Review: Nested Approval Gates ImplementationSummaryThis PR implements a robust nested approval gate system for corpus-to-document sub-agent interactions. The implementation is well-architected with comprehensive testing and documentation. Overall, this is high-quality work with strong security considerations. ✅ Strengths1. Excellent Security Design
2. Comprehensive Test CoverageThe test suite (
3. Defensive Programming
4. Consistent Frontend Implementation
5. Excellent Documentation
🔍 Issues & RecommendationsCritical IssuesNone identified - the implementation appears solid. Minor Issues & Suggestions1. Potential Race Condition in Bypass Flag (Low Risk)Location: self.config._approval_bypass_allowed = True # type: ignore[attr-defined]Issue: If multiple tool executions happen concurrently (unlikely but possible in async context), the flag could be set by one execution and read by another. Recommendation: Consider using a thread-local or context variable instead: from contextvars import ContextVar
_approval_bypass = ContextVar('approval_bypass', default=False)This would make the bypass flag execution-context-specific rather than agent-instance-wide. 2. Empty String Check Could Be More PythonicLocation: if (typeof subName === "string" && subName.length > 0) {Recommendation: While correct, you could simplify to: if (typeof subName === "string" && subName) {Empty strings are falsy in JavaScript/TypeScript. 3. Missing Type Safety on
|
Replace channels database_sync_to_async with asgiref sync_to_async in
_check_user_permissions to avoid thread-hopping that breaks under
TestCase transaction isolation (InterfaceError: connection already closed).
Update test assertions to match visible_to_user() error messages ("lacks
READ permission" instead of "not found"), consistent with IDOR prevention.
Restrict TestModel in test_search_exact_text_tool_testmodel to read-only
tools to prevent approval-gated tools from aborting the stream early.
PR Review: Nested Approval Gates for Sub-Agent Tool CallsSummaryThis PR implements a sophisticated nested approval gate system that allows document sub-agents to propagate tool approval requests up to the corpus agent level. The implementation is well-designed with comprehensive test coverage and excellent documentation. Overall Assessment: ✅ APPROVEDThe code quality is high, security considerations are well-handled, and the implementation follows project conventions. I have a few minor suggestions below. Code Quality & Best Practices✅ Strengths
🔶 Minor Concerns
Performance Considerations✅ Good Practices
🔶 Potential Optimizations
Security Concerns✅ Well Handled
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Cover the previously untested branch where resume_with_approval finds and executes a tool from config.tools (wrapper_fn path) rather than falling back to pydantic_ai_agent._function_tools. Also covers the TypeError fallthrough from config.tools to the registry lookup. Closes codecov patch coverage gap for PR #881.
Code Review - PR #881: Nested Approval Gates for Sub-Agent Tool CallsThis PR implements a sophisticated nested approval gate system that allows sub-agents to propagate tool approval requests to their parent agents. The implementation is well-architected with strong security considerations. Here's my detailed review: ✅ Strengths1. Excellent Security Design
2. Comprehensive Test Coverage
3. Defensive Programming
4. Documentation Excellence
5. Consistent Frontend Implementation
🔍 Areas for Improvement1. Minor: Permission Check Efficiency (Low Priority)Location: The permission check uses # Current (works fine):
has_perm = await sync_to_async(
lambda: Document.objects.visible_to_user(user)
.filter(pk=document_id)
.exists()
)()
# Alternative (slightly cleaner):
has_perm = await sync_to_async(
Document.objects.visible_to_user(user).filter(pk=document_id).exists
)()However, the current approach is safer because it fully constructs the queryset before passing to 2. Documentation: Missing Type Hints for Dynamic Attributes (Low Priority)Location: The code uses dynamic attributes on self.config._approval_bypass_allowed = True # type: ignore[attr-defined]
bypass = getattr(config, "_approval_bypass_allowed", False)Suggestion: Consider adding a comment explaining why 3. Test Isolation: Serial Execution (Minor)Location: All tests are marked with Consideration: The tests mock 🐛 Potential Issues1. Minor: Race Condition in Approval Modal State (Low Severity)Location: The approval result handler clears if (pendingApproval && data?.message_id === pendingApproval.messageId) {
setPendingApproval(null);
setShowApprovalModal(false);
// ... update message status
}Scenario: If the user manually closes the modal before the server sends Recommendation: Add a check or comment clarifying this is intentional idempotent behavior. 2. Code Duplication: Approval Handler Logic (Low Priority)Location: The Recommendation: Extract common approval state management logic into a shared hook (e.g., 🎯 Performance Considerations1. Sub-Agent Stream Overhead (Acceptable)The nested approval system adds one extra round-trip when a sub-agent tool requires approval:
This is the unavoidable cost of nested approval gates and is acceptable given the security benefits. 2. Metadata Stripping on Every Resume (Efficient)Line 1118 strips 🔒 Security Analysis✅ No Security Concerns Identified
📋 Checklist Against CLAUDE.md Requirements
🎓 Educational ValueThis PR demonstrates excellent patterns for:
📊 Final Recommendation✅ APPROVE with minor suggestions This PR is production-ready. The implementation is secure, well-tested, and follows the codebase conventions. The minor suggestions above are optimizations that can be addressed in follow-up PRs. Before Merge:
Post-Merge Enhancements (Optional):
Great work on this feature! The security-first approach with the bypass flag and the comprehensive test coverage are exemplary. 🎉 |
…ol-permissions-VO5ii
Code Review: Nested Approval Gates ImplementationThis is an excellent PR that implements a sophisticated nested approval gate system with strong security considerations and comprehensive test coverage. The implementation is well-architected and follows the project's conventions. 🎯 SummaryThis PR successfully implements approval gate propagation from document sub-agents to corpus agents, allowing users to approve/reject tool calls made by nested agents. The implementation includes:
✅ Strengths1. Security-First DesignThe refactoring from LLM-visible
✨ Excellent security hardening - this is exactly the right approach. 2. Robust Error HandlingLines 2480-2491: Defensive validation for malformed approval events:
Lines 1036-1042 (frontend): Type validation before unwrapping metadata: if (typeof subName === "string" && subName.length > 0) {
const subArgs = toolCall.arguments?._sub_tool_arguments;
toolCall.arguments = subArgs && typeof subArgs === "object" ? subArgs : {};
}3. Metadata Stripping PatternLines 1199: Clean separation of UI metadata from function arguments: tool_args = {k: v for k, v in tool_args.items() if not k.startswith("_")}This is elegant and prevents internal keys from polluting tool signatures. 4. Comprehensive Test Coveragetest_nested_approval_gates.py (890 lines):
The test suite is exemplary - covers edge cases, security boundaries, and error paths. 5. Excellent Documentationdocs/architecture/llms/README.md (lines 925-1010):
CHANGELOG.md: Detailed entries with file locations and line numbers following project conventions. 6. Consistent Frontend ImplementationAll three chat components (CorpusChat, ChatTray, useAgentChat) consistently handle:
🔍 Areas for Consideration1. Minor: ToolConfirmationRequired Re-raise CommentLines 849-853: The comment is clear, but could mention that this allows nested approval to bubble up through multiple agent levels (corpus → document → potentially deeper). Suggestion (optional): except ToolConfirmationRequired:
# Sub-agent approval gates must propagate so the outer handler
# can pause the conversation and surface it to the user. This
# enables approval requests to bubble through nested agent calls
# (corpus → document → potentially deeper hierarchies).
raise2. Permission Check Migration to visible_to_user()pydantic_ai_tools.py lines 649-695: Excellent fix switching from However, the switch from Minor note: The lambda pattern ( # Use sync_to_async to avoid thread-hopping that breaks TestCase isolation.
# Lambda is needed because visible_to_user() must run in sync context
# before the queryset is passed to async context.
has_perm = await sync_to_async(
lambda: Document.objects.visible_to_user(user)
.filter(pk=document_id)
.exists()
)()3. Test Isolation with @pytest.mark.serialThe tests use Question for team: Should these tests be in a separate parallel-excluded test file, or is marking them serial sufficient for CI? (Not blocking - just organizational preference) 4. TypeScript Type SafetyCorpusChat.tsx lines 1036-1042: The unwrapping logic validates types at runtime. Consider adding TypeScript types for the nested metadata: interface PendingToolCall {
name: string;
arguments?: {
_sub_tool_name?: string;
_sub_tool_arguments?: Record<string, unknown>;
[key: string]: unknown;
};
tool_call_id?: string;
}This is minor and the runtime checks are solid. 🔒 Security Analysis✅ Passed Security Review
No security concerns identified. 🧪 Testing RecommendationsBefore Merging:
Future Enhancement (Not Blocking):Consider adding an E2E Playwright test for the full approval flow:
📊 Code Quality Metrics
🎓 Architectural InsightsThis PR demonstrates excellent architectural patterns:
The nested approval system is well-designed to extend to deeper hierarchies (corpus → document → potentially future nested levels). ✨ Final RecommendationAPPROVED with minor suggestions (none blocking) This is production-ready code with:
The security hardening (removal of LLM-visible skip_approval parameter) is particularly noteworthy and sets a good precedent for future agent tool implementations. Suggested merge strategy: Wait for CI green, then merge to main. 🙏 AcknowledgmentKudos to the implementation team - this is a complex feature executed with attention to security, testing, and documentation. The approval bypass protection mechanism is exemplary. Reviewed by: Claude Code Review Agent |
…ithub.com/Open-Source-Legal/OpenContracts into claude/investigate-tool-permissions-VO5ii
Code Review for PR #881: Nested Approval GatesThis is an excellent implementation of a complex feature with strong security considerations and comprehensive testing. Below is my detailed feedback. ✅ Strengths1. Security-First DesignThe PR addresses a critical security vulnerability where an LLM could potentially bypass approval gates through prompt injection. The solution is elegant:
2. Comprehensive Test CoverageThe test suite (
3. Excellent Documentation
4. Defensive Programming
5. Frontend Consistency
🔍 Code Quality Observations1. Metadata Stripping Logic (Line 1199, pydantic_ai_agents.py)tool_args = {k: v for k, v in tool_args.items() if not k.startswith("_")}✅ Good: Simple and effective. The underscore prefix convention is well-documented and tested. 2. ToolConfirmationRequired Propagation (Lines 849-853)except ToolConfirmationRequired:
# Sub-agent approval gates must propagate so the
# outer ToolConfirmationRequired handler can pause
# the conversation and surface it to the user.
raise✅ Good: Explicit re-raise with clear comment explaining why. This is the correct pattern. 3. Permission Check Refactoring (pydantic_ai_tools.py)The switch from
has_perm = await sync_to_async(
lambda: Document.objects.visible_to_user(user)
.filter(pk=document_id)
.exists()
)()This works but could be clearer. Consider adding a comment explaining why the lambda is necessary (to avoid thread-hopping issues with TestCase transaction isolation). 4. Frontend Sub-Tool Unwrapping (CorpusChat.tsx, lines 1250-1261)const toolCall = { ...data.pending_tool_call };
if (toolCall.name === "ask_document") {
const subName = toolCall.arguments?._sub_tool_name;
if (typeof subName === "string" && subName.length > 0) {
toolCall.name = subName;
const subArgs = toolCall.arguments?._sub_tool_arguments;
toolCall.arguments =
subArgs && typeof subArgs === "object" ? subArgs : {};
}
}✅ Good: Defensive validation ensures malformed metadata doesn't crash the UI. 🐛 Potential Issues1. Type Safety on
|
Screenshots workflow: add a CI gate step that polls Backend CI and Frontend CI status before committing. If any required check fails, the screenshot push is skipped — preventing the screenshot commit from becoming the latest commit and hiding CI failures. Build fix: switch opencv-python to opencv-python-headless 4.13.0.92. The previous opencv-python 4.12.0.88 pulled numpy==1.21.2 as a build dependency which is incompatible with Python 3.11+ on the pytorch base image. The headless variant has pre-built wheels and no GUI dependencies (only cv2 usage is server-side thumbnail generation).
…ithub.com/Open-Source-Legal/OpenContracts into claude/investigate-tool-permissions-VO5ii
PR Review: Nested Approval Gates for Sub-Agent Tool Calls (#881)SummaryThis PR implements a sophisticated nested approval gate system that allows document sub-agents to propagate tool approval requests up to their parent corpus agent. The implementation is well-architected with strong security considerations and comprehensive test coverage. Code Quality: ✅ ExcellentStrengths
Potential Issues & Suggestions1.
|
…l gate - Add requires_approval=True to update_corpus_description tool in the corpus agent. The flag was missing, so _maybe_raise never fired and the tool executed without user confirmation. - Add diagnostic test that creates a REAL pydantic-ai Agent (not mocked) and verifies _check_tool_requires_approval finds requires_approval=True through pydantic-ai's Tool.function attribute chain. Database evidence confirmed neither update_corpus_description nor update_document_description ever reached AWAITING_APPROVAL state — the LLM always called update_corpus_description directly (even for document updates), and that tool lacked the requires_approval flag.
PR Review: Nested Approval Gates for Sub-Agent Tool CallsSummaryThis PR implements a sophisticated nested approval gate system that allows sub-agents (document-specific agents) to propagate tool approval requests to their parent agent (corpus agent). The implementation is well-architected with strong security considerations and comprehensive test coverage. Strengths1. Excellent Security Design
2. Robust Error Handling
3. Comprehensive Test CoverageThe new test file (
4. Excellent Documentation
5. Consistent Frontend Implementation
Issues & Concerns1. CI Workflow Complexity (Minor)Location: The new CI gate logic adds 70+ lines of bash polling. While the approach is sound, consider:
2. Missing Permission Context in _EmptyDeps (Moderate)Location: The user_id = None
document_id = None
corpus_id = NoneBut the actual Recommendation: Populate class _EmptyDeps:
skip_approval_gate = True
user_id = self.config.user_id
document_id = getattr(self.config, 'document_id', None)
corpus_id = getattr(self.config, 'corpus_id', None)3. Potential Race Condition in Frontend (Minor)Location: The approval status is updated in both Recommendation: Batch state updates: setChat(prev => prev.map(msg => ...));
setServerMessages(prev => prev.map(msg => ...)); // Move into same React tick with useCallback or batchOr use a single source of truth and derive the other. 4. opencv-python → opencv-python-headless (Unrelated Change)Location: This change is unrelated to nested approval gates. Should be in a separate PR. Recommendation: Move to a separate "Update opencv to headless" PR for cleaner git history. 5. Missing Test for ask_document with Multiple Approval Events (Minor)The test suite covers single approval events well, but doesn't test:
Recommendation: Add a test with Code QualityPositives
Suggestions
Performance Considerations
Security AssessmentOverall: ✅ Strong security posture The Test Coverage AssessmentOverall: ✅ Excellent The 1004-line test file is one of the most thorough test suites I've seen for this codebase. It covers:
Only missing test: multiple sequential approval events (see Issue #5 above). RecommendationsMust Fix (Before Merge)
Should Fix (Before Merge)
Consider for Follow-up
ConclusionThis is high-quality work with exceptional attention to security and edge cases. The nested approval gate architecture is sound, and the implementation follows the codebase's conventions well. The test coverage is outstanding. With the Approval: ✅ Approved with minor fix required Review generated by Claude Code analyzing PR #881 |
After a user approves a tool call, resume_with_approval was re-running the full agent with a text continuation prompt. The LLM never saw the tool result in native ToolCallPart/ToolReturnPart format, so it called the same tool again, triggering another approval — infinite loop. Fix: build proper pydantic-ai message history with ToolCallPart (the LLM's original call) + ToolReturnPart (the execution result) and pass it to _stream_core via message_history kwarg. The LLM now sees a completed tool round-trip and produces a natural follow-up response. Confirmed via database evidence: messages 6-10 in conversation 2 all had pending_tool_call for update_corpus_description with empty content, showing the LLM kept re-invoking the same tool after each approval.
PR Review: Nested Approval Gates for Sub-Agent Tool CallsThis is a well-architected implementation of nested approval gates with strong security considerations. The approach correctly solves the approval propagation problem for sub-agents while preventing LLM prompt injection attacks. ✅ Strengths1. Security-First Design
2. Comprehensive Test Coverage
3. Defensive Programming
4. Documentation
🔍 Issues & RecommendationsCritical Issues1. Commit Message Attribution Violation
Action Required: Remove all Claude attribution from commit history before merging. Code Quality Issues2. Metadata Stripping Location Suggestion: Consider stripping metadata before emitting the event, or document why the event should preserve these keys. 3. Type Safety in Frontend toolCall.arguments = subArgs && typeof subArgs === "object" ? subArgs : {};Arrays pass toolCall.arguments = subArgs && typeof subArgs === "object" && !Array.isArray(subArgs) ? subArgs : {};4. Missing Type Assertion self.config._approval_bypass_allowed = True # type: ignore[attr-defined]While this works, it would be cleaner to define Performance Considerations5. Repeated Documentation Issues6. Missing Edge Case Documentation 7. Frontend Comment Clarity 🎯 Minor Improvements8. Code Duplication // frontend/src/utils/agentUtils.ts
export function unwrapSubToolApproval(toolCall: ToolCall): ToolCall {
if (toolCall.name === "ask_document") {
const subName = toolCall.arguments?._sub_tool_name;
if (typeof subName === "string" && subName.length > 0) {
const subArgs = toolCall.arguments?._sub_tool_arguments;
return {
...toolCall,
name: subName,
arguments: subArgs && typeof subArgs === "object" && !Array.isArray(subArgs) ? subArgs : {}
};
}
}
return toolCall;
}Per 9. Test Isolation 10. Magic Number # In opencontractserver/constants/agent_constants.py or similar
APPROVAL_BYPASS_FLAG = "_approval_bypass_allowed"🔒 Security Considerations11. Approval Context Validation ✅ 12. Race Condition? 📊 Test Coverage AssessmentThe test suite is comprehensive but missing:
🎨 Code Style13. Consistent Naming 14. Frontend Message Types ✅ Verification ChecklistBased on CLAUDE.md requirements:
🎬 Final RecommendationApprove with Changes Required This is solid architectural work with good security design. The nested approval propagation is well-implemented and the test coverage is thorough. However, the Claude attribution in commits MUST be removed before merging per project policy. Required Changes:
Recommended Changes: Estimated Review Effort: This is a complex, multi-layer change touching backend agent framework, frontend state management, and WebSocket protocol. The implementation quality is high, but thorough testing in a staging environment is recommended before production deployment. Great work on the security-first approach with the bypass flag! 🔒 |
After resume_with_approval executes an approved tool, the continuation message was missing tool usage in its timeline because the tool ran outside of _stream_core's TimelineBuilder scope. Fix: - Yield ThoughtEvents for tool_call and tool_result before _stream_core so the frontend receives them via WebSocket in real-time - Add initial_timeline kwarg to _stream_core that pre-populates the TimelineBuilder, ensuring the entries appear in the persisted DB timeline as well Also confirmed the corpus description revision system IS working — the update_corpus_description tool creates CorpusDescriptionRevision records correctly; corpus._read_md_description_content() returns the latest.
PR Review: Nested Approval Gates for Sub-Agent Tool CallsOverall this is a well-structured PR with a clear security motivation. The approval propagation flow is solid and the test coverage is thorough. Below are observations ranging from blocking concerns to minor suggestions. Security & Correctness[Good] Approval bypass uses side-channel flag, not LLM-visible parameter The move from a [Bug]
[Concern] The flag is set as a dynamic attribute on self.config._approval_bypass_allowed = True # type: ignore[attr-defined]This bypasses static type checking entirely. If [Concern] Document-not-found becomes a silent permission pass The refactored try:
doc = await Document.objects.aget(pk=document_id)
...
except Document.DoesNotExist:
raise PermissionError(f"Document {document_id} not found")with: has_perm = await sync_to_async(
lambda: Document.objects.visible_to_user(user).filter(pk=document_id).exists()
)()
if not has_perm:
raise PermissionError(...)
Architecture / Design[Concern] initial_timeline: list[dict] | None = stream_kwargs.pop("initial_timeline", None)Using [Concern] Native tool-call history reconstruction may re-trigger tool approval The PR appends a [Minor] The diff adds Frontend[Inconsistency]
If [Minor] case "ASYNC_RESUME":
// Agent is resuming after approval. Unlike CorpusChat (which has
// an explicit isProcessing state), ChatTray derives its processing
// indicator from message state (isAssistantResponding), so no
// additional state update is needed here.
break;If no action is needed, the [Minor] Sub-tool unwrapping only applies to The frontend logic that unwraps Tests[Good] Test coverage is thorough The 10 async tests in [Minor] Test class uses Several tests define an identical class _Ctx:
tool_call_id = "test-call"
deps = types.SimpleNamespace(...)This should be extracted to a helper method or a module-level fixture to reduce duplication. [Minor] The CI / Workflow[Good] CI gate before screenshot commits is a sensible safety net The 60-minute polling loop in [Minor] Polling interval is 30s with up to 60 minutes total If backend tests take 30+ minutes (as noted in CLAUDE.md), the 120-attempt × 30-second loop is barely sufficient. Consider raising Summary
The main item to resolve before merging is confirming (or fixing) that |
Summary
This PR implements a nested approval gate system that allows sub-agents (document-specific agents) to propagate tool approval requests up to their parent agent (corpus agent) for user confirmation. When a document agent's tool requires approval, the request is surfaced to the corpus agent level, which pauses execution and prompts the user. Upon approval, the corpus agent resumes the document agent with a flag to skip re-approval.
Key Changes
Backend (pydantic_ai_agents.py):
ToolConfirmationRequiredexception handling in_stream_core()to propagate sub-agent approval gates to the outer handlerresume_with_approval()to strip internal metadata keys (prefixed with_) from tool arguments before execution, as these are only needed for UI displayask_document_tool()with:skip_approvalparameter to bypass approval gates when resuming after user approvalapproval_neededevents from document agents and re-raise them asToolConfirmationRequiredat the corpus agent level_sub_tool_name,_sub_tool_arguments) in the approval request for UI displayFrontend (CorpusChat.tsx, useAgentChat.ts, ChatTray.tsx):
ASYNC_APPROVAL_RESULTandASYNC_RESUMEmessage types to handle approval workflowask_documentapproval is shown, display the actual sub-agent tool name and arguments insteadASYNC_APPROVAL_RESULTto clear pending approval state and update UIASYNC_RESUMEto maintain processing indicator during agent resumptionImplementation Details
The approval propagation flow:
approval_neededeventask_document_tool()catches this and raisesToolConfirmationRequiredwith sub-tool detailsApprovalNeededEventfor the frontendask_document) to the userresume_with_approval()withskip_approval=Truehttps://claude.ai/code/session_01KKkXwMVumAZTNqw3QFWBSH