-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Stagehand agent improvements #1094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🦋 Changeset detectedLatest commit: a7bf3a7 The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Summary
This PR significantly enhances the Stagehand agent with sophisticated model routing, expanded toolset, and robust context management capabilities. The implementation intelligently adapts tools and system prompts based on the LLM provider and configuration.
Key Changes:
- Smart Model Routing: Tools are dynamically filtered based on the model being used. When using Claude models with
storeActions: false
, specialized coordinate-based tools (click
,type
,dragAndDrop
,clickAndHold
) are enabled for better performance, while other models use the genericact
tool - Enhanced Toolset: Added
think
for reasoning,keys
for keyboard input,search
for web searches (when EXA_API_KEY is available), and Anthropic-optimized tools for precise interactions - Advanced Context Management: Implements multi-level compression with image optimization, A11y tree management, and intelligent checkpointing every 25 tool calls. Token-based summarization kicks in at 120,000 tokens to maintain performance
- Type Safety: Strong typing with discriminated unions for
AgentToolCall
andAgentToolResult
, plus tool-specific parameter validation
The architecture demonstrates thoughtful design with proper separation of concerns, robust error handling, and performance optimizations. The model routing logic ensures optimal tool selection while maintaining backward compatibility.
Confidence Score: 4/5
- This PR is safe to merge with minimal risk
- Score reflects well-architected changes with comprehensive testing mentioned, though some minor issues exist like the parameter description error in dragAndDrop tool
- Pay close attention to lib/agent/tools/dragAndDrop.ts for the parameter description fix
Important Files Changed
File Analysis
Filename | Score | Overview |
---|---|---|
lib/handlers/stagehandAgentHandler.ts | 4/5 | Core agent handler with model routing and tool creation logic - well structured |
lib/prompt.ts | 4/5 | System prompt generation with model-specific routing and tool filtering - complex but solid |
lib/agent/tools/index.ts | 4/5 | Tool creation and filtering logic with proper model routing - clean implementation |
lib/agent/tools/dragAndDrop.ts | 3/5 | Drag and drop tool with coordinate-based interaction - has parameter description issue |
lib/agent/contextManager/contextManager.ts | 4/5 | Complex context management with compression, checkpointing, and summarization - sophisticated implementation |
types/agent.ts | 5/5 | Type definitions with new AgentOptions.storeActions property - clean type additions |
Sequence Diagram
sequenceDiagram
participant Client as Client
participant Handler as StagehandAgentHandler
participant Prompt as buildStagehandAgentSystemPrompt
participant Tools as createAgentTools
participant Filter as filterToolsByModelName
participant Context as ContextManager
participant Wrapper as modelWrapper
participant LLM as LLMClient
Client->>Handler: execute(options)
Handler->>Handler: Extract storeActions from options
Handler->>Prompt: buildStagehandAgentSystemPrompt(url, modelName, instruction, storeActions)
Prompt->>Prompt: Detect if Anthropic model (modelName.startsWith("claude"))
Prompt->>Prompt: Check useAnthropicCustomizations = isAnthropic && storeActions === false
alt useAnthropicCustomizations = true
Prompt-->>Handler: Return prompt with click, type, dragAndDrop tools
else useAnthropicCustomizations = false
Prompt-->>Handler: Return prompt with act tool (no click/type/dragAndDrop)
end
Handler->>Tools: createAgentTools(stagehand, {mainModel, storeActions})
Tools->>Tools: Create all tool instances
note over Tools: EXA_API_KEY check for search tool
Tools->>Filter: filterToolsByModelName(mainModel, tools, storeActions)
alt isAnthropic && storeActions === false
Filter->>Filter: Keep all tools except fillForm
Filter-->>Tools: Return Anthropic-optimized toolset
else Other models or storeActions = true
Filter->>Filter: Remove dragAndDrop, clickAndHold, click, type, fillFormVision
Filter-->>Tools: Return standard toolset
end
Tools-->>Handler: Return filtered tools
Handler->>Context: new ContextManager(logger)
Handler->>Wrapper: modelWrapper(llmClient, contextManager, sessionId)
Wrapper->>Wrapper: Wrap model with context processing middleware
Wrapper-->>Handler: Return wrapped model
Handler->>LLM: generateText({model: wrappedModel, system: systemPrompt, tools})
LLM-->>Handler: Return result with actions
Handler-->>Client: Return AgentResult
36 files reviewed, 1 comment
why
This PR enhances the Stagehand agent with model routing, expanded toolset, and more robust context management to improve performance and reliability across different LLM providers.
what changed
Model Routing
New Tools Added
Anthropic-Specific Tools (enabled when storeActions: false)
Model-Agnostic Tools
Enhanced Context Management
Enhanced Type Safety
test plan