Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -657,6 +657,16 @@ $B screenshot /tmp/github-profile.png
$B diff https://staging.app.com https://prod.app.com
```

### Interfaze (optional — OCR, search, structured URL extract)

Requires Interfaze API key (`$B interfaze-setup` or `INTERFAZE_API_KEY`). See `/browse` skill for full patterns.

```bash
$B ocr [--json] # viewport or @ref / image path
$B search "your query" --limit 5
$B ai-scrape https://example.com --schema '{"name":"string"}' [--json]
```

### Multi-step chain (efficient for long flows)

```bash
Expand Down Expand Up @@ -783,9 +793,12 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
### Extraction
| Command | Description |
|---------|-------------|
| `ai-scrape <url> --schema {"field":"string"} [--json]` | AI structured extraction from any URL via Interfaze (bot-protected sites; requires Interfaze API key) |
| `archive [path]` | Save complete page as MHTML via CDP |
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
| `ocr [path|@ref|selector] [--json]` | OCR via Interfaze: extract text from image file, PDF document, @ref, CSS selector, or current viewport (requires Interfaze API key) |
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |
| `search <query> [--limit N]` | Web search via Interfaze with citations (requires Interfaze API key) |

### Interaction
| Command | Description |
Expand Down Expand Up @@ -846,6 +859,7 @@ Refs are invalidated on navigation — run `snapshot` again after `goto`.
| `chain` | Run commands from JSON stdin. Format: [["cmd","arg1",...],...] |
| `frame <sel|@ref|--name n|--url pattern|main>` | Switch to iframe context (or main to return) |
| `inbox [--clear]` | List messages from sidebar scout inbox |
| `interfaze-setup` | Save Interfaze API key to ~/.gstack/interfaze.json for ocr, search, ai-scrape |
| `watch [stop]` | Passive observation — periodic snapshots while user browses |

### Tabs
Expand Down
10 changes: 10 additions & 0 deletions SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,16 @@ $B screenshot /tmp/github-profile.png
$B diff https://staging.app.com https://prod.app.com
```

### Interfaze (optional — OCR, search, structured URL extract)

Requires Interfaze API key (`$B interfaze-setup` or `INTERFAZE_API_KEY`). See `/browse` skill for full patterns.

```bash
$B ocr [--json] # viewport or @ref / image path
$B search "your query" --limit 5
$B ai-scrape https://example.com --schema '{"name":"string"}' [--json]
```

### Multi-step chain (efficient for long flows)

```bash
Expand Down
20 changes: 19 additions & 1 deletion browse/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,21 @@ $B snapshot -D # verify deletion happened
$B diff https://staging.app.com https://prod.app.com
```

### 11. Show screenshots to the user
### 11. Interfaze (optional — API key)

Structured extraction and OCR via [Interfaze](https://interfaze.ai/) (OpenAI-compatible API). One-time setup: `$B interfaze-setup` or `INTERFAZE_API_KEY` / `~/.gstack/interfaze.json`.

```bash
$B ocr # OCR current viewport (PNG → text)
$B ocr @e3 --json # element screenshot + bounding boxes in precontext
$B ocr /tmp/mockup.png --json # file path (must be under cwd or temp)
$B search "arxiv diffusion 2025" --limit 5
$B ai-scrape https://example.com --schema '{"title":"string","price":"number"}' [--json]
```

Use **`ocr`** when `snapshot` has no text (canvas, charts, image-heavy UI) but you need exact strings. Use **`ai-scrape`** for structured fields from URLs that resist local Playwright (heavy bot protection); complements `$B scrape` (local media download).

### 12. Show screenshots to the user
After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.

## User Handoff
Expand Down Expand Up @@ -675,9 +689,12 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
### Extraction
| Command | Description |
|---------|-------------|
| `ai-scrape <url> --schema {"field":"string"} [--json]` | AI structured extraction from any URL via Interfaze (bot-protected sites; requires Interfaze API key) |
| `archive [path]` | Save complete page as MHTML via CDP |
| `download <url|@ref> [path] [--base64]` | Download URL or media element to disk using browser cookies |
| `ocr [path|@ref|selector] [--json]` | OCR via Interfaze: extract text from image file, PDF document, @ref, CSS selector, or current viewport (requires Interfaze API key) |
| `scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]` | Bulk download all media from page. Writes manifest.json |
| `search <query> [--limit N]` | Web search via Interfaze with citations (requires Interfaze API key) |

### Interaction
| Command | Description |
Expand Down Expand Up @@ -738,6 +755,7 @@ $B prettyscreenshot --cleanup --scroll-to ".pricing" --width 1440 ~/Desktop/hero
| `chain` | Run commands from JSON stdin. Format: [["cmd","arg1",...],...] |
| `frame <sel|@ref|--name n|--url pattern|main>` | Switch to iframe context (or main to return) |
| `inbox [--clear]` | List messages from sidebar scout inbox |
| `interfaze-setup` | Save Interfaze API key to ~/.gstack/interfaze.json for ocr, search, ai-scrape |
| `watch [stop]` | Passive observation — periodic snapshots while user browses |

### Tabs
Expand Down
16 changes: 15 additions & 1 deletion browse/SKILL.md.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,21 @@ $B snapshot -D # verify deletion happened
$B diff https://staging.app.com https://prod.app.com
```

### 11. Show screenshots to the user
### 11. Interfaze (optional — API key)

Structured extraction and OCR via [Interfaze](https://interfaze.ai/) (OpenAI-compatible API). One-time setup: `$B interfaze-setup` or `INTERFAZE_API_KEY` / `~/.gstack/interfaze.json`.

```bash
$B ocr # OCR current viewport (PNG → text)
$B ocr @e3 --json # element screenshot + bounding boxes in precontext
$B ocr /tmp/mockup.png --json # file path (must be under cwd or temp)
$B search "arxiv diffusion 2025" --limit 5
$B ai-scrape https://example.com --schema '{"title":"string","price":"number"}' [--json]
```

Use **`ocr`** when `snapshot` has no text (canvas, charts, image-heavy UI) but you need exact strings. Use **`ai-scrape`** for structured fields from URLs that resist local Playwright (heavy bot protection); complements `$B scrape` (local media download).

### 12. Show screenshots to the user
After `$B screenshot`, `$B snapshot -a -o`, or `$B responsive`, always use the Read tool on the output PNG(s) so the user can see them. Without this, screenshots are invisible.

## User Handoff
Expand Down
7 changes: 7 additions & 0 deletions browse/src/commands.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ export const READ_COMMANDS = new Set([
'dialog', 'is',
'inspect',
'media', 'data',
'ocr', 'search', 'ai-scrape',
]);

export const WRITE_COMMANDS = new Set([
Expand All @@ -40,6 +41,7 @@ export const META_COMMANDS = new Set([
'watch',
'state',
'frame',
'interfaze-setup',
]);

export const ALL_COMMANDS = new Set([...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS]);
Expand All @@ -49,6 +51,7 @@ export const PAGE_CONTENT_COMMANDS = new Set([
'text', 'html', 'links', 'forms', 'accessibility', 'attrs',
'console', 'dialog',
'media', 'data',
'search', 'ai-scrape',
]);

/** Wrap output from untrusted-content commands with trust boundary markers */
Expand Down Expand Up @@ -108,6 +111,9 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
// Data extraction
'download': { category: 'Extraction', description: 'Download URL or media element to disk using browser cookies', usage: 'download <url|@ref> [path] [--base64]' },
'scrape': { category: 'Extraction', description: 'Bulk download all media from page. Writes manifest.json', usage: 'scrape <images|videos|media> [--selector sel] [--dir path] [--limit N]' },
'ocr': { category: 'Extraction', description: 'OCR via Interfaze: extract text from image file, PDF document, @ref, CSS selector, or current viewport (requires Interfaze API key)', usage: 'ocr [path|@ref|selector] [--json]' },
'search': { category: 'Extraction', description: 'Web search via Interfaze with citations (requires Interfaze API key)', usage: 'search <query> [--limit N]' },
'ai-scrape': { category: 'Extraction', description: 'AI structured extraction from any URL via Interfaze (bot-protected sites; requires Interfaze API key)', usage: 'ai-scrape <url> --schema {"field":"string"} [--json]' },
'archive': { category: 'Extraction', description: 'Save complete page as MHTML via CDP', usage: 'archive [path]' },
// Visual
'screenshot': { category: 'Visual', description: 'Save screenshot (supports element crop via CSS/@ref, --clip region, --viewport)', usage: 'screenshot [--viewport] [--clip x,y,w,h] [selector|@ref] [path]' },
Expand Down Expand Up @@ -141,6 +147,7 @@ export const COMMAND_DESCRIPTIONS: Record<string, { category: string; descriptio
'state': { category: 'Server', description: 'Save/load browser state (cookies + URLs)', usage: 'state save|load <name>' },
// Frame
'frame': { category: 'Meta', description: 'Switch to iframe context (or main to return)', usage: 'frame <sel|@ref|--name n|--url pattern|main>' },
'interfaze-setup': { category: 'Meta', description: 'Save Interfaze API key to ~/.gstack/interfaze.json for ocr, search, ai-scrape', usage: 'interfaze-setup' },
// CSS Inspector
'inspect': { category: 'Inspection', description: 'Deep CSS inspection via CDP — full rule cascade, box model, computed styles', usage: 'inspect [selector] [--all] [--history]' },
'style': { category: 'Interaction', description: 'Modify CSS property on element (with undo support)', usage: 'style <sel> <prop> <value> | style --undo [N]' },
Expand Down
51 changes: 51 additions & 0 deletions browse/src/interfaze-auth.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
/**
* Interfaze API key resolution (OCR, search, ai-scrape).
*
* Resolution order:
* 1. ~/.gstack/interfaze.json → { "api_key": "..." }
* 2. INTERFAZE_API_KEY environment variable
* 3. null (commands return setup instructions)
*/

import fs from 'fs';
import path from 'path';

const CONFIG_PATH = path.join(process.env.HOME || '~', '.gstack', 'interfaze.json');

export function resolveInterfazeKey(): string | null {
try {
if (fs.existsSync(CONFIG_PATH)) {
const content = fs.readFileSync(CONFIG_PATH, 'utf-8');
const config = JSON.parse(content) as { api_key?: string };
if (config.api_key && typeof config.api_key === 'string' && config.api_key.trim()) {
return config.api_key.trim();
}
}
} catch {
// fall through
}

if (process.env.INTERFAZE_API_KEY?.trim()) {
return process.env.INTERFAZE_API_KEY.trim();
}

return null;
}

export function saveInterfazeKey(key: string): void {
const dir = path.dirname(CONFIG_PATH);
fs.mkdirSync(dir, { recursive: true });
fs.writeFileSync(CONFIG_PATH, JSON.stringify({ api_key: key }, null, 2));
fs.chmodSync(CONFIG_PATH, 0o600);
}

export function interfazeSetupHint(command: string): string {
return (
`${command} requires an Interfaze API key.\n\n` +
`Run: $B interfaze-setup\n` +
` or save to ~/.gstack/interfaze.json: { "api_key": "..." }\n` +
` or set INTERFAZE_API_KEY environment variable\n\n` +
`Get a key at: https://interfaze.ai/dashboard\n` +
`Docs: https://interfaze.ai/docs`
);
}
64 changes: 64 additions & 0 deletions browse/src/interfaze-client.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
/**
* Shared Interfaze client via Vercel AI SDK OpenAI-compatible provider.
* Captures Interfaze `precontext` (OCR bounds, search hits, scrape elements) in providerMetadata.
*/

import { createOpenAICompatible, type MetadataExtractor } from '@ai-sdk/openai-compatible';
import type { SharedV3ProviderMetadata } from '@ai-sdk/provider';
import { resolveInterfazeKey } from './interfaze-auth';

const precontextExtractor: MetadataExtractor = {
async extractMetadata({ parsedBody }) {
const body = parsedBody as Record<string, unknown> | null;
if (!body || !('precontext' in body) || body.precontext == null) {
return undefined;
}
return {
interfaze: { precontext: body.precontext },
} as SharedV3ProviderMetadata;
},
createStreamExtractor: () => {
let precontext: unknown;
return {
processChunk(parsedChunk: unknown) {
const chunk = parsedChunk as Record<string, unknown> | null;
if (chunk && 'precontext' in chunk && chunk.precontext != null) {
precontext = chunk.precontext;
}
},
buildMetadata(): SharedV3ProviderMetadata | undefined {
if (precontext == null) return undefined;
return { interfaze: { precontext } } as SharedV3ProviderMetadata;
},
};
},
};

export const INTERFAZE_MODEL = 'interfaze-beta';

export type InterfazeProvider = ReturnType<typeof createInterfazeCompatible>;

function createInterfazeCompatible() {
const apiKey = resolveInterfazeKey();
if (!apiKey) return null;
return createOpenAICompatible({
name: 'interfaze',
apiKey,
baseURL: 'https://api.interfaze.ai/v1',
supportsStructuredOutputs: true,
metadataExtractor: precontextExtractor,
});
}

/** Returns null if no API key configured. */
export function createInterfazeProvider(): InterfazeProvider | null {
return createInterfazeCompatible();
}

export function getInterfazePrecontext(
providerMetadata: Record<string, unknown> | undefined,
): unknown {
if (!providerMetadata) return undefined;
const interfaze = providerMetadata.interfaze as { precontext?: unknown } | undefined;
return interfaze?.precontext;
}
47 changes: 47 additions & 0 deletions browse/src/interfaze-schema.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/**
* Build a Zod object schema from CLI --schema JSON hints:
* { "name": "string", "price": "number", "ok": "boolean" }
*/

import { z } from 'zod';

const TYPE_ALIASES: Record<string, 'string' | 'number' | 'boolean'> = {
string: 'string',
str: 'string',
text: 'string',
number: 'number',
num: 'number',
float: 'number',
int: 'number',
integer: 'number',
boolean: 'boolean',
bool: 'boolean',
};

export function buildZodFromHints(hints: Record<string, unknown>): z.ZodObject<Record<string, z.ZodTypeAny>> {
const shape: Record<string, z.ZodTypeAny> = {};
for (const [key, raw] of Object.entries(hints)) {
const t = String(raw).toLowerCase().trim();
const norm = TYPE_ALIASES[t] ?? 'string';
if (norm === 'number') shape[key] = z.number();
else if (norm === 'boolean') shape[key] = z.boolean();
else shape[key] = z.string();
}
if (Object.keys(shape).length === 0) {
throw new Error('Schema must be a non-empty JSON object, e.g. {"title":"string","count":"number"}');
}
return z.object(shape);
}

export function parseSchemaJson(jsonStr: string): z.ZodObject<Record<string, z.ZodTypeAny>> {
let parsed: unknown;
try {
parsed = JSON.parse(jsonStr);
} catch {
throw new Error('Invalid JSON for --schema');
}
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
throw new Error('--schema must be a JSON object, e.g. {"field":"string"}');
}
return buildZodFromHints(parsed as Record<string, unknown>);
}
23 changes: 22 additions & 1 deletion browse/src/meta-commands.ts
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ export async function handleMetaCommand(
}
lastWasWrite = true;
} else if (READ_COMMANDS.has(name)) {
result = await handleReadCommand(name, cmdArgs, session);
result = await handleReadCommand(name, cmdArgs, session, bm);
if (PAGE_CONTENT_COMMANDS.has(name)) {
result = wrapUntrustedContent(result, bm.getCurrentUrl());
}
Expand Down Expand Up @@ -647,6 +647,27 @@ export async function handleMetaCommand(
return `Switched to frame: ${frame.url()}`;
}

case 'interfaze-setup': {
const { resolveInterfazeKey, saveInterfazeKey } = await import('./interfaze-auth');
const existing = resolveInterfazeKey();
if (existing) {
console.log('Replacing existing Interfaze API key in ~/.gstack/interfaze.json');
} else {
console.log('No Interfaze API key found.');
console.log('Get one at: https://interfaze.ai/dashboard\n');
}
process.stdout.write('Interfaze API key: ');
const reader = Bun.stdin.stream().getReader();
const { value } = await reader.read();
reader.releaseLock();
const key = new TextDecoder().decode(value ?? new Uint8Array()).trim();
if (!key || key.length < 8) {
throw new Error('Invalid key: enter a non-empty API key (min 8 characters).');
}
saveInterfazeKey(key);
return 'Key saved to ~/.gstack/interfaze.json (0600). Interfaze commands: ocr, search, ai-scrape.';
}

default:
throw new Error(`Unknown meta command: ${command}`);
}
Expand Down
Loading