Skip to content

Conversation

tkattkat
Copy link
Collaborator

why

This PR enhances the Stagehand agent with model routing, expanded toolset, and more robust context management to improve performance and reliability across different LLM providers.

what changed

Model Routing

  • Model-specific tool filtering: Tools are now dynamically included/excluded based on the model being used
  • Anthropic-optimized toolset: When using Claude models with storeActions: false, enables specialized tools for better performance
  • Custom system prompts: Different system prompts are applied based on the model to optimize behavior

New Tools Added

Anthropic-Specific Tools (enabled when storeActions: false)

  • clickAndHold: Performs click and hold actions with coordinate precision
  • type: coordinate based typing
  • click: Precise coordinate-based clicking
  • dragAndDrop: drag and drop functionality

Model-Agnostic Tools

  • think: Allows the agent to reason through problems before acting
  • keys: Keyboard input handling for complex key combinations
  • search: Web search capability (auto-enabled when EXA_API_KEY is provided)

Enhanced Context Management

  • Image optimization: Automatically removes old images, keeping only the 2 most recent
  • A11y tree management: Maintains only the 2 most recent accessibility trees
  • checkpointing: Creates conversation summaries every 25 tool calls
  • Token-based summarization: When context exceeds 120,000 tokens, automatically summarizes content

Enhanced Type Safety

  • Discriminated union types: AgentToolCall and AgentToolResult provide complete type safety
  • Tool-specific typing: Each tool has strongly typed parameters and return values

test plan

  • tested locally
  • tested on browserbase
  • tested with exa key, and without to ensure search tool is only present in prompt & tools when key is present
  • tested with claude 4 / non to ensure system prompt / tools are properly routed based on models being used

Copy link

changeset-bot bot commented Sep 23, 2025

🦋 Changeset detected

Latest commit: a7bf3a7

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@tkattkat tkattkat marked this pull request as ready for review September 24, 2025 00:40
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Summary

This PR significantly enhances the Stagehand agent with sophisticated model routing, expanded toolset, and robust context management capabilities. The implementation intelligently adapts tools and system prompts based on the LLM provider and configuration.

Key Changes:

  • Smart Model Routing: Tools are dynamically filtered based on the model being used. When using Claude models with storeActions: false, specialized coordinate-based tools (click, type, dragAndDrop, clickAndHold) are enabled for better performance, while other models use the generic act tool
  • Enhanced Toolset: Added think for reasoning, keys for keyboard input, search for web searches (when EXA_API_KEY is available), and Anthropic-optimized tools for precise interactions
  • Advanced Context Management: Implements multi-level compression with image optimization, A11y tree management, and intelligent checkpointing every 25 tool calls. Token-based summarization kicks in at 120,000 tokens to maintain performance
  • Type Safety: Strong typing with discriminated unions for AgentToolCall and AgentToolResult, plus tool-specific parameter validation

The architecture demonstrates thoughtful design with proper separation of concerns, robust error handling, and performance optimizations. The model routing logic ensures optimal tool selection while maintaining backward compatibility.

Confidence Score: 4/5

  • This PR is safe to merge with minimal risk
  • Score reflects well-architected changes with comprehensive testing mentioned, though some minor issues exist like the parameter description error in dragAndDrop tool
  • Pay close attention to lib/agent/tools/dragAndDrop.ts for the parameter description fix

Important Files Changed

File Analysis

Filename        Score        Overview
lib/handlers/stagehandAgentHandler.ts 4/5 Core agent handler with model routing and tool creation logic - well structured
lib/prompt.ts 4/5 System prompt generation with model-specific routing and tool filtering - complex but solid
lib/agent/tools/index.ts 4/5 Tool creation and filtering logic with proper model routing - clean implementation
lib/agent/tools/dragAndDrop.ts 3/5 Drag and drop tool with coordinate-based interaction - has parameter description issue
lib/agent/contextManager/contextManager.ts 4/5 Complex context management with compression, checkpointing, and summarization - sophisticated implementation
types/agent.ts 5/5 Type definitions with new AgentOptions.storeActions property - clean type additions

Sequence Diagram

sequenceDiagram
    participant Client as Client
    participant Handler as StagehandAgentHandler
    participant Prompt as buildStagehandAgentSystemPrompt
    participant Tools as createAgentTools
    participant Filter as filterToolsByModelName
    participant Context as ContextManager
    participant Wrapper as modelWrapper
    participant LLM as LLMClient

    Client->>Handler: execute(options)
    Handler->>Handler: Extract storeActions from options
    
    Handler->>Prompt: buildStagehandAgentSystemPrompt(url, modelName, instruction, storeActions)
    Prompt->>Prompt: Detect if Anthropic model (modelName.startsWith("claude"))
    Prompt->>Prompt: Check useAnthropicCustomizations = isAnthropic && storeActions === false
    alt useAnthropicCustomizations = true
        Prompt-->>Handler: Return prompt with click, type, dragAndDrop tools
    else useAnthropicCustomizations = false
        Prompt-->>Handler: Return prompt with act tool (no click/type/dragAndDrop)
    end
    
    Handler->>Tools: createAgentTools(stagehand, {mainModel, storeActions})
    Tools->>Tools: Create all tool instances
    note over Tools: EXA_API_KEY check for search tool
    Tools->>Filter: filterToolsByModelName(mainModel, tools, storeActions)
    
    alt isAnthropic && storeActions === false
        Filter->>Filter: Keep all tools except fillForm
        Filter-->>Tools: Return Anthropic-optimized toolset
    else Other models or storeActions = true
        Filter->>Filter: Remove dragAndDrop, clickAndHold, click, type, fillFormVision
        Filter-->>Tools: Return standard toolset
    end
    
    Tools-->>Handler: Return filtered tools
    
    Handler->>Context: new ContextManager(logger)
    Handler->>Wrapper: modelWrapper(llmClient, contextManager, sessionId)
    Wrapper->>Wrapper: Wrap model with context processing middleware
    Wrapper-->>Handler: Return wrapped model
    
    Handler->>LLM: generateText({model: wrappedModel, system: systemPrompt, tools})
    LLM-->>Handler: Return result with actions
    Handler-->>Client: Return AgentResult
Loading

36 files reviewed, 1 comment

Edit Code Review Bot Settings | Greptile

@tkattkat tkattkat marked this pull request as draft September 24, 2025 18:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant