Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 65 additions & 13 deletions docs/reference/agents.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ Use the `create()` factory method to instantiate agents with typed parameters:
from hud.agents import ClaudeAgent

agent = ClaudeAgent.create(
checkpoint_name="claude-sonnet-4-5",
model="claude-sonnet-4-5",
max_tokens=8192,
verbose=True,
)
Expand Down Expand Up @@ -87,7 +87,7 @@ Claude-specific implementation using Anthropic's API.

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `checkpoint_name` | `str` | Claude model to use | `"claude-sonnet-4-5"` |
| `model` | `str` | Claude model to use | `"claude-sonnet-4-5"` |
| `model_client` | `AsyncAnthropic` | Anthropic client | Auto-created |
| `max_tokens` | `int` | Maximum response tokens | `16384` |
| `use_computer_beta` | `bool` | Enable computer-use beta features | `True` |
Expand All @@ -102,7 +102,7 @@ from hud.agents import ClaudeAgent
env = Environment("browser").connect_hub("hud-evals/browser")

agent = ClaudeAgent.create(
checkpoint_name="claude-sonnet-4-5",
model="claude-sonnet-4-5",
max_tokens=8192,
)

Expand All @@ -123,7 +123,7 @@ OpenAI agent using the Responses API for function calling.

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `checkpoint_name` | `str` | Model to use | `"gpt-5.1"` |
| `model` | `str` | Model to use | `"gpt-5.1"` |
| `model_client` | `AsyncOpenAI` | OpenAI client | Auto-created |
| `max_output_tokens` | `int` | Maximum response tokens | `None` |
| `temperature` | `float` | Sampling temperature | `None` |
Expand All @@ -136,7 +136,7 @@ OpenAI agent using the Responses API for function calling.

```python
agent = OpenAIAgent.create(
checkpoint_name="gpt-4o",
model="gpt-4o",
max_output_tokens=2048,
temperature=0.7,
)
Expand All @@ -154,7 +154,7 @@ OpenAI Operator-style agent with computer-use capabilities. Extends `OpenAIAgent

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `checkpoint_name` | `str` | Model to use | `"computer-use-preview"` |
| `model` | `str` | Model to use | `"computer-use-preview"` |
| `environment` | `Literal["windows","mac","linux","browser"]` | Computer environment | `"linux"` |

Inherits all `OpenAIAgent` parameters.
Expand All @@ -165,31 +165,83 @@ Inherits all `OpenAIAgent` parameters.
from hud.agents import GeminiAgent
```

Google Gemini agent with native computer-use capabilities.
Google Gemini agent for standard tool-calling tasks.

**Config Parameters:**

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `checkpoint_name` | `str` | Gemini model to use | `"gemini-2.5-computer-use-preview-10-2025"` |
| `model` | `str` | Gemini model to use | `"gemini-3-pro-preview"` |
| `model_client` | `genai.Client` | Gemini client | Auto-created |
| `temperature` | `float` | Sampling temperature | `1.0` |
| `top_p` | `float` | Top-p sampling | `0.95` |
| `top_k` | `int` | Top-k sampling | `40` |
| `max_output_tokens` | `int` | Maximum response tokens | `8192` |
| `excluded_predefined_functions` | `list[str]` | Predefined functions to exclude | `[]` |
| `validate_api_key` | `bool` | Validate key on init | `True` |

**Example:**

```python
agent = GeminiAgent.create(
checkpoint_name="gemini-2.5-computer-use-preview-10-2025",
model="gemini-2.5-pro",
temperature=0.7,
max_output_tokens=4096,
)
```

### GeminiCUAAgent

```python
from hud.agents import GeminiCUAAgent
```

Google Gemini Computer Use Agent with native computer-use capabilities. Extends `GeminiAgent` with support for Gemini's predefined computer actions (click, type, scroll, etc.).

<Note>
Use `GeminiCUAAgent` for computer-use tasks (browser automation, desktop interaction). Use `GeminiAgent` for standard tool-calling tasks.
</Note>

<Warning>
Requires the `gemini_computer` tool to be available in the environment. The agent will fail to initialize if this tool is not present.
</Warning>

**Config Parameters:**

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `model` | `str` | Gemini CUA model | `"gemini-2.5-computer-use-preview-10-2025"` |
| `excluded_predefined_functions` | `list[str]` | Predefined Gemini actions to disable | `[]` |

Inherits all `GeminiAgent` parameters.

**Predefined Functions:**

GeminiCUAAgent supports these native Gemini computer actions:
- `click_at`, `hover_at`, `type_text_at`
- `scroll_document`, `scroll_at`
- `drag_and_drop`
- `navigate`, `go_back`, `go_forward`, `search`
- `key_combination`
- `wait_5_seconds`
- `open_web_browser`

**Example:**

```python
from hud import Environment
from hud.agents import GeminiCUAAgent

env = Environment("browser").connect_hub("hud-evals/browser")

agent = GeminiCUAAgent.create(
model="gemini-2.5-computer-use-preview",
temperature=0.7,
)

task = env("navigate", url="https://example.com")
result = await agent.run(task, max_steps=20)
```

### OpenAIChatAgent

```python
Expand All @@ -202,7 +254,7 @@ OpenAI-compatible chat.completions agent. Works with any endpoint implementing t

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `checkpoint_name` | `str` | Model name | `"gpt-5-mini"` |
| `model` | `str` | Model name | `"gpt-5-mini"` |
| `openai_client` | `AsyncOpenAI` | OpenAI-compatible client | `None` |
| `api_key` | `str` | API key (if not using client) | `None` |
| `base_url` | `str` | Base URL (if not using client) | `None` |
Expand All @@ -217,7 +269,7 @@ from hud.agents import OpenAIChatAgent
agent = OpenAIChatAgent.create(
base_url="http://localhost:11434/v1", # Ollama
api_key="not-needed",
checkpoint_name="llama3.1",
model="llama3.1",
completion_kwargs={"temperature": 0.2},
)

Expand All @@ -226,7 +278,7 @@ from openai import AsyncOpenAI

agent = OpenAIChatAgent.create(
openai_client=AsyncOpenAI(base_url="http://localhost:8000/v1"),
checkpoint_name="served-model",
model="served-model",
)
```

Expand Down
2 changes: 1 addition & 1 deletion docs/reference/cli/eval.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ hud eval [SOURCE] [AGENT] [OPTIONS]
</ParamField>

<ParamField path="agent" type="string">
Agent to use: `claude`, `openai`, `operator`, `gemini`, `openai_compatible`. If omitted, an interactive preset selector appears.
Agent to use: `claude`, `openai`, `operator`, `gemini`, `gemini_cua`, `openai_compatible`. If omitted, an interactive preset selector appears.
</ParamField>

## Options
Expand Down
1 change: 1 addition & 0 deletions docs/reference/types.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,7 @@ agent = agent_cls.create()
| `AgentType.OPENAI` | `OpenAIAgent` |
| `AgentType.OPERATOR` | `OperatorAgent` |
| `AgentType.GEMINI` | `GeminiAgent` |
| `AgentType.GEMINI_CUA` | `GeminiCUAAgent` |
| `AgentType.OPENAI_COMPATIBLE` | `OpenAIChatAgent` |

## ContentBlock
Expand Down
6 changes: 3 additions & 3 deletions hud/agents/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ def create_agent(model: str, **kwargs: Any) -> MCPAgent:
# Resolve class and gateway info
agent_cls, gateway_info = resolve_cls(model)

# Get model ID from gateway info or use input
# Get model name from gateway info or use input
model_id = model
if gateway_info:
model_id = gateway_info.get("model") or gateway_info.get("id") or model
model_id = gateway_info.get("model_name") or model

# Determine provider: from gateway info, or infer from agent class
if gateway_info:
provider = gateway_info.get("provider") or "openai"
provider = gateway_info["provider"]["name"]
else:
provider = "openai"
if agent_cls.__name__ == "ClaudeAgent":
Expand Down
1 change: 0 additions & 1 deletion hud/agents/gemini.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,6 @@ def __init__(self, params: GeminiCreateParams | None = None, **kwargs: Any) -> N

model_client = self.config.model_client
if model_client is None:
# Default to HUD gateway when HUD_API_KEY is available
if settings.api_key:
from hud.agents.gateway import build_gateway_client

Expand Down
1 change: 0 additions & 1 deletion hud/agents/openai.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,6 @@ def __init__(self, params: OpenAICreateParams | None = None, **kwargs: Any) -> N

model_client = self.config.model_client
if model_client is None:
# Default to HUD gateway when HUD_API_KEY is available
if settings.api_key:
from hud.agents.gateway import build_gateway_client

Expand Down
22 changes: 8 additions & 14 deletions hud/agents/resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,9 @@

_models_cache: list[dict[str, Any]] | None = None

# Provider name → AgentType value (only anthropic differs)
_PROVIDER_TO_AGENT = {"anthropic": "claude"}


def _fetch_gateway_models() -> list[dict[str, Any]]:
"""Fetch available models from HUD gateway (cached)."""
"""Fetch available models from HUD API (cached)."""
global _models_cache
if _models_cache is not None:
return _models_cache
Expand All @@ -30,14 +27,15 @@ def _fetch_gateway_models() -> list[dict[str, Any]]:

try:
resp = httpx.get(
f"{settings.hud_gateway_url}/models",
f"{settings.hud_api_url}/models/",
headers={"Authorization": f"Bearer {settings.api_key}"},
timeout=10.0,
)
resp.raise_for_status()
data = resp.json()
_models_cache = data.get("data", data) if isinstance(data, dict) else data
return _models_cache or []
models = data.get("models") or []
_models_cache = models
return models
except Exception:
return []

Expand All @@ -59,12 +57,8 @@ def resolve_cls(model: str) -> tuple[type[MCPAgent], dict[str, Any] | None]:

# Gateway lookup
for m in _fetch_gateway_models():
if model in (m.get("id"), m.get("name"), m.get("model")):
provider = (m.get("provider") or "openai_compatible").lower()
agent_str = _PROVIDER_TO_AGENT.get(provider, provider)
try:
return AgentType(agent_str).cls, m
except ValueError:
return AgentType.OPENAI_COMPATIBLE.cls, m
if model in (m.get("id"), m.get("name"), m.get("model_name")):
agent_str = m.get("sdk_agent_type") or m["provider"]["default_sdk_agent_type"]
return AgentType(agent_str).cls, m

raise ValueError(f"Model '{model}' not found")
Loading
Loading