Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
bcbe2e4
restore agent integrations
sameelarif Aug 8, 2025
3719911
working build
sameelarif Aug 8, 2025
8efdd36
Merge branch 'main' into sarif/stg-519-mcp-and-tools-support
sameelarif Aug 8, 2025
e75a850
deps
sameelarif Aug 8, 2025
e3c7697
ignore inference summary
sameelarif Aug 11, 2025
088c51f
better tool calling in operator and oai
sameelarif Aug 11, 2025
e16dfd0
example integrations
sameelarif Aug 19, 2025
c32ebbf
handle malformed args from LLM
sameelarif Aug 22, 2025
52348c6
fix "none" tool choice handling
sameelarif Aug 22, 2025
ab7742c
merge
sameelarif Aug 25, 2025
b581063
mcp docs
sameelarif Aug 27, 2025
0382bae
Merge branch 'main' into sarif/stg-519-mcp-and-tools-support
sameelarif Aug 27, 2025
5e3a7f3
basic implementation
tkattkat Aug 23, 2025
74fb597
move tools to own folder + add screenshot filterning
tkattkat Aug 23, 2025
36dbb00
add accessability tool + context handling for it
tkattkat Aug 23, 2025
b3a0138
add fill form tool + agent to eval runner
tkattkat Aug 25, 2025
db4e31e
remove operator handler
tkattkat Aug 25, 2025
664e90b
update naming of the computer use agent handler
tkattkat Aug 25, 2025
3fe546c
add type guard
tkattkat Aug 25, 2025
96c0508
update system
tkattkat Aug 25, 2025
7b7adb9
add scroll tool
tkattkat Aug 25, 2025
803642c
update act tool
tkattkat Aug 25, 2025
14196a6
remove comments
tkattkat Aug 25, 2025
722f951
remove operator type
tkattkat Aug 25, 2025
a04a0ee
update fillform
tkattkat Aug 25, 2025
91d9b69
add llms thinking to stagehand logs
tkattkat Aug 25, 2025
0232b1e
update fillform, messageprocessing, and logs
tkattkat Aug 25, 2025
66aba99
remove refresh tool + timestamp on aria tree tool
tkattkat Aug 25, 2025
ba8c94e
update scroll tool + system prompt
tkattkat Aug 25, 2025
7dee006
update goto tool
tkattkat Aug 25, 2025
42e7805
add pascal case to cua handler definition
tkattkat Aug 25, 2025
ad1719b
update wait tool
tkattkat Aug 25, 2025
06279ac
add "execution model"
tkattkat Aug 25, 2025
2eab512
update aria tree tool
tkattkat Aug 26, 2025
fe75f60
update param names / types
tkattkat Aug 26, 2025
f225d9e
update task completed in actions
tkattkat Aug 26, 2025
e5048b5
update instruction handling
tkattkat Aug 26, 2025
b1c78fe
update agent text to log level 1
tkattkat Aug 26, 2025
a51f812
update result.message to contain all "reasoning" text throughout agen…
tkattkat Aug 27, 2025
a5f615d
Merge branch 'main' into agent-revamp
tkattkat Aug 28, 2025
2ea5c69
update screenshot quality
tkattkat Aug 29, 2025
63e7977
Merge branch 'mcp-tools-support' into agent-revamp
tkattkat Aug 29, 2025
75ccd81
implement mcp tools to new agent
tkattkat Aug 29, 2025
eee3389
add changeset
tkattkat Aug 29, 2025
d74eba7
Merge branch 'main' into agent-revamp
tkattkat Aug 29, 2025
4a0345f
Merge main into agent-revamp
tkattkat Aug 29, 2025
f7cb8c9
change back args
tkattkat Aug 29, 2025
2f7be48
Merge remote-tracking branch 'origin/main' into agent-revamp
tkattkat Sep 2, 2025
f8ca451
add inference time
tkattkat Sep 3, 2025
7bbcb78
add comment on system prompt
tkattkat Sep 3, 2025
6cbfec6
add trycatch and change zod
tkattkat Sep 4, 2025
31bae67
pass stagehand page instead of page
tkattkat Sep 4, 2025
95fcecb
move get languate model to llmclient for proper typing while using el…
tkattkat Sep 4, 2025
d82c6c3
fallback to iframes true on iframes
tkattkat Sep 4, 2025
5ad60ab
make logic cleaner
tkattkat Sep 4, 2025
3158586
use stagehandpage instead of page
tkattkat Sep 4, 2025
7927f39
remove screenshot console logs & use logger for extract
tkattkat Sep 4, 2025
9c7f393
add back warning when not using provider/model format
tkattkat Sep 4, 2025
6e2e3ec
add docs for agent
tkattkat Sep 4, 2025
a08ac8d
Merge branch 'main' into agent-revamp
tkattkat Sep 4, 2025
7f0f11d
update to use act instead of observe
tkattkat Sep 4, 2025
786b139
update copy on variable
tkattkat Sep 4, 2025
220b37f
Merge branch 'agent-revamp' of https://github.com/browserbase/stageha…
tkattkat Sep 4, 2025
f00222b
remove closing page from close tool
tkattkat Sep 5, 2025
001cc4f
Merge branch 'main' into agent-revamp
miguelg719 Sep 5, 2025
77961b1
update init stagehand and sf library card eval
tkattkat Sep 5, 2025
6008fc7
add new model to task config
tkattkat Sep 5, 2025
07211cc
update extract prompt
tkattkat Sep 9, 2025
f86955c
add changeset
tkattkat Sep 9, 2025
ed42209
add url note, and remove optional from examples
tkattkat Sep 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/pink-snakes-sneeze.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

Replace operator handler with base of new agent
5 changes: 5 additions & 0 deletions .changeset/tired-cats-repeat.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@browserbasehq/stagehand": patch
---

replace operator agent with scaffold for new stagehand agent
18 changes: 17 additions & 1 deletion docs/basics/agent.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,11 @@ agent.execute("apply for a job at browserbase")

## Using `agent()`

Here is how you can use `agent()` to create an agent.
There are two ways to create agents in Stagehand:

### Computer Use Agents

Use computer use agents with specialized models from OpenAI or Anthropic:

<CodeGroup>
```typescript TypeScript
Expand Down Expand Up @@ -54,6 +58,18 @@ await agent.execute("apply for a job at Browserbase")
```
</CodeGroup>

### Use Stagehand Agent with Any LLM

Use the agent without specifying a provider to utilize any model or LLM provider:

<Note>Non CUA agents are currently only supported in TypeScript</Note>

```typescript TypeScript
const agent = stagehand.agent();
await agent.execute("apply for a job at Browserbase")
```


## MCP Integrations

Agents can be enhanced with external tools and services through MCP (Model Context Protocol) integrations. This allows your agent to access external APIs and data sources beyond just browser interactions.
Expand Down
2 changes: 1 addition & 1 deletion evals/index.eval.ts
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ import { CustomOpenAIClient } from "@/examples/external_clients/customOpenAI";
import OpenAI from "openai";
import { initStagehand } from "./initStagehand";
import { AgentProvider } from "@/lib/agent/AgentProvider";
import { AISdkClient } from "@/examples/external_clients/aisdk";
import { AISdkClient } from "@/lib/llm/aisdk";
import { getAISDKLanguageModel } from "@/lib/llm/LLMProvider";
import { loadApiKeyFromEnv } from "@/lib/utils";
import { LogLine } from "@/types/log";
Expand Down
5 changes: 5 additions & 0 deletions evals/initStagehand.ts
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,11 @@ export const initStagehand = async ({
model: modelName,
provider: modelName.startsWith("claude") ? "anthropic" : "openai",
} as AgentConfig;
} else {
agentConfig = {
model: modelName,
executionModel: "google/gemini-2.5-flash",
} as AgentConfig;
}

const agent = stagehand.agent(agentConfig);
Expand Down
6 changes: 5 additions & 1 deletion evals/taskConfig.ts
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,11 @@ const DEFAULT_EVAL_MODELS = process.env.EVAL_MODELS

const DEFAULT_AGENT_MODELS = process.env.EVAL_AGENT_MODELS
? process.env.EVAL_AGENT_MODELS.split(",")
: ["computer-use-preview-2025-03-11", "claude-sonnet-4-20250514"];
: [
"computer-use-preview-2025-03-11",
"claude-sonnet-4-20250514",
"anthropic/claude-sonnet-4-20250514",
];

/**
* getModelList:
Expand Down
8 changes: 2 additions & 6 deletions evals/tasks/agent/sf_library_card.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,15 @@ export const sf_library_card: EvalFunction = async ({
}) => {
try {
await stagehand.page.goto("https://sflib1.sfpl.org/selfreg");

const agentResult = await agent.execute({
instruction:
"Fill in the 'Residential Address' field with '166 Geary St'",
instruction: "Fill in the 'street Address' field with '166 Geary St'",
maxSteps: Number(process.env.AGENT_EVAL_MAX_STEPS) || 3,
});
logger.log(agentResult);

await stagehand.page.mouse.wheel(0, -1000);
const evaluator = new Evaluator(stagehand);
const result = await evaluator.ask({
question:
"Does the page show the 'Residential Address' field filled with '166 Geary St'?",
"Does the page show the 'street Address' field filled with '166 Geary St'?",
});

if (result.evaluation !== "YES" && result.evaluation !== "NO") {
Expand Down
55 changes: 55 additions & 0 deletions lib/agent/tools/act.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import { tool } from "ai";
import { z } from "zod/v3";
import { StagehandPage } from "../../StagehandPage";

export const createActTool = (
stagehandPage: StagehandPage,
executionModel?: string,
) =>
tool({
description: "Perform an action on the page (click, type)",
parameters: z.object({
action: z.string()
.describe(`Describe what to click, or type within in a short, specific phrase that mentions the element type.
Examples:
- "click the Login button"
- "click the language dropdown"
- type "John" into the first name input
- type "Doe" into the last name input`),
}),
execute: async ({ action }) => {
try {
let result;
if (executionModel) {
result = await stagehandPage.page.act({
action,
modelName: executionModel,
});
} else {
result = await stagehandPage.page.act(action);
}
const isIframeAction = result.action === "an iframe";

if (isIframeAction) {
const fallback = await stagehandPage.page.act(
executionModel
? { action, modelName: executionModel, iframes: true }
: { action, iframes: true },
);
return {
success: fallback.success,
action: fallback.action,
isIframe: true,
};
}

return {
success: result.success,
action: result.action,
isIframe: false,
};
} catch (error) {
return { success: false, error: error.message };
}
},
});
35 changes: 35 additions & 0 deletions lib/agent/tools/ariaTree.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import { tool } from "ai";
import { z } from "zod/v3";
import { StagehandPage } from "../../StagehandPage";

export const createAriaTreeTool = (stagehandPage: StagehandPage) =>
tool({
description:
"gets the accessibility (ARIA) tree from the current page. this is useful for understanding the page structure and accessibility features. it should provide full context of what is on the page",
parameters: z.object({}),
execute: async () => {
const { page_text } = await stagehandPage.page.extract();
const pageUrl = stagehandPage.page.url();

let content = page_text;
const MAX_CHARACTERS = 70000;

const estimatedTokens = Math.ceil(content.length / 4);

if (estimatedTokens > MAX_CHARACTERS) {
const maxCharacters = MAX_CHARACTERS * 4;
content =
content.substring(0, maxCharacters) +
"\n\n[CONTENT TRUNCATED: Exceeded 70,000 token limit]";
}

return {
content,
pageUrl,
};
},
experimental_toToolResultContent: (result) => {
const content = typeof result === "string" ? result : result.content;
return [{ type: "text", text: `Accessibility Tree:\n${content}` }];
},
});
16 changes: 16 additions & 0 deletions lib/agent/tools/close.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import { tool } from "ai";
import { z } from "zod/v3";

export const createCloseTool = () =>
tool({
description: "Complete the task and close",
parameters: z.object({
reasoning: z.string().describe("Summary of what was accomplished"),
taskComplete: z
.boolean()
.describe("Whether the task was completed successfully"),
}),
execute: async ({ reasoning, taskComplete }) => {
return { success: true, reasoning, taskComplete };
},
});
104 changes: 104 additions & 0 deletions lib/agent/tools/extract.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
import { tool } from "ai";
import { z } from "zod/v3";
import { StagehandPage } from "../../StagehandPage";
import { LogLine } from "@/types/log";

/**
* Evaluates a Zod schema string and returns the actual Zod schema
* Uses Function constructor to evaluate the schema string in a controlled way
*/
function evaluateZodSchema(
schemaStr: string,
logger?: (message: LogLine) => void,
): z.ZodTypeAny {
try {
// Create a function that returns the evaluated schema
// We pass z as a parameter to make it available in the evaluated context
const schemaFunction = new Function("z", `return ${schemaStr}`);
return schemaFunction(z);
} catch (error) {
logger?.({
category: "extract",
message: `Failed to evaluate schema string, using z.any(): ${error}`,
level: 1,
auxiliary: {
error: {
value: error,
type: "string",
},
},
});
return z.any();
}
}

export const createExtractTool = (
stagehandPage: StagehandPage,
executionModel?: string,
logger?: (message: LogLine) => void,
) =>
tool({
description: `Extract structured data from the current page based on a provided schema.

USAGE GUIDELINES:
- Keep schemas MINIMAL - only include fields essential for the task
- IMPORANT: only use this if explicitly asked for structured output. In most scenarios, you should use the aria tree tool over this.
- If you need to extract a link, make sure the type defintion follows the format of z.string().url()
EXAMPLES:
1. Extract a single value:
instruction: "extract the product price"
schema: "z.object({ price: z.number()})"

2. Extract multiple fields:
instruction: "extract product name and price"
schema: "z.object({ name: z.string(), price: z.number() })"

3. Extract arrays:
instruction: "extract all product names and prices"
schema: "z.object({ products: z.array(z.object({ name: z.string(), price: z.number() })) })"`,
parameters: z.object({
instruction: z
.string()
.describe(
"Clear instruction describing what data to extract from the page",
),
schema: z
.string()
.describe(
'Zod schema as a string (e.g., "z.object({ price: z.number() })")',
),
}),
execute: async ({ instruction, schema }) => {
try {
// Evaluate the schema string to get the actual Zod schema
const zodSchema = evaluateZodSchema(schema, logger);

// Ensure we have a ZodObject
const schemaObject =
zodSchema instanceof z.ZodObject
? zodSchema
: z.object({ result: zodSchema });

// Extract with the schema - only pass modelName if executionModel is explicitly provided
const result = await stagehandPage.page.extract({
instruction,
schema: schemaObject,
...(executionModel && { modelName: executionModel }),
});

return {
success: true,
data: result,
timestamp: Date.now(),
};
} catch (error) {
const errorMessage =
error instanceof Error ? error.message : String(error);
return {
success: false,
error: `Failed to extract data: ${errorMessage}`,
timestamp: Date.now(),
};
}
},
});
71 changes: 71 additions & 0 deletions lib/agent/tools/fillform.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
import { tool } from "ai";
import { z } from "zod/v3";
import { StagehandPage } from "../../StagehandPage";

export const createFillFormTool = (
stagehandPage: StagehandPage,
executionModel?: string,
) =>
tool({
description: `📝 FORM FILL - SPECIALIZED MULTI-FIELD INPUT TOOL

CRITICAL: Use this for ANY form with 2+ input fields (text inputs, textareas, etc.)

WHY THIS TOOL EXISTS:
• Forms are the #1 use case for multi-field input
• Optimized specifically for input/textarea elements
• 4-6x faster than individual typing actions

Use fillForm: Pure form filling (inputs, textareas only)


MANDATORY USE CASES (always use fillForm for these):
Registration forms: name, email, password fields
Contact forms: name, email, message fields
Checkout forms: address, payment info fields
Profile updates: multiple user data fields
Search filters: multiple criteria inputs



PARAMETER DETAILS:
• fields: Array of { action, value } objects.
– action: short description of where to type (e.g. "type 'john@example.com' into the email input").
– value: the exact text to enter.
`,
parameters: z.object({
fields: z
.array(
z.object({
action: z
.string()
.describe(
'Description of the typing action, e.g. "type foo into the bar field"',
),
value: z.string().describe("Text to type into the target field"),
}),
)
.min(1, "Provide at least one field to fill"),
}),

execute: async ({ fields }) => {
const instruction = `Return observation results for the following actions: ${fields
.map((field) => field.action)
.join(", ")}`;

const observeResults = executionModel
? await stagehandPage.page.observe({
instruction,
modelName: executionModel,
})
: await stagehandPage.page.observe(instruction);

const completedActions = [];
for (const result of observeResults) {
const action = await stagehandPage.page.act(result);
completedActions.push(action);
}

return { success: true, actions: completedActions };
},
});
Loading