Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
301 changes: 7 additions & 294 deletions docs/my-website/blog/anthropic_opus_4_5_and_advanced_features/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -897,14 +897,13 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \

## Effort Parameter: Control Token Usage {#effort-parameter}

Controls aspects like how much effort the model puts into its response, via `output_config={"effort": ..}`.
Control how much effort Claude puts into its response using the `reasoning_effort` parameter. This allows you to trade off between response thoroughness and token efficiency.

:::info

Soon, we will map OpenAI's `reasoning_effort` parameter to this.
LiteLLM automatically maps `reasoning_effort` to Anthropic's `output_config` format and adds the required `effort-2025-11-24` beta header for Claude Opus 4.5.
:::

Potential Values for `effort` parameter: `"high"`, `"medium"`, `"low"`.
Potential values for `reasoning_effort` parameter: `"high"`, `"medium"`, `"low"`.

### Usage Example

Expand All @@ -920,7 +919,7 @@ message = "Analyze the trade-offs between microservices and monolithic architect
response_high = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": message}],
output_config={"effort": "high"}
reasoning_effort="high"
)

print("High effort response:")
Expand All @@ -931,7 +930,7 @@ print(f"Tokens used: {response_high.usage.completion_tokens}\n")
response_medium = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": message}],
output_config={"effort": "medium"}
reasoning_effort="medium"
)

print("Medium effort response:")
Expand All @@ -942,7 +941,7 @@ print(f"Tokens used: {response_medium.usage.completion_tokens}\n")
response_low = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{"role": "user", "content": message}],
output_config={"effort": "low"}
reasoning_effort="low"
)

print("Low effort response:")
Expand Down Expand Up @@ -987,295 +986,9 @@ curl --location 'http://0.0.0.0:4000/chat/completions' \
"role": "user",
"content": "Analyze the trade-offs between microservices and monolithic architectures"
}],
"output_config": {
"effort": "high"
}
}
'
```
</TabItem>
</Tabs>


## Cost Tracking: Monitor Tool Search Usage {#cost-tracking}

### Understanding Tool Search Costs

Tool search operations are tracked separately in the usage object, allowing you to monitor and optimize costs.

It is available in the `usage` object, under `server_tool_use.tool_search_requests`.

Anthropic charges $0.0001 per tool search request.

### Tracking Example

<Tabs>
<TabItem value="sdk" label="LiteLLM Python SDK">

```python
import litellm

tools = [
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# ... 100 deferred tools
]

response = litellm.completion(
model="anthropic/claude-sonnet-4-5-20250929",
messages=[{
"role": "user",
"content": "Find and use the weather tool for San Francisco"
}],
tools=tools
)

# Standard token usage
print("Token Usage:")
print(f" Input tokens: {response.usage.prompt_tokens}")
print(f" Output tokens: {response.usage.completion_tokens}")
print(f" Total tokens: {response.usage.total_tokens}")

# Tool search specific usage
if hasattr(response.usage, 'server_tool_use') and response.usage.server_tool_use:
print(f"\nTool Search Usage:")
print(f" Search requests: {response.usage.server_tool_use.tool_search_requests}")

# Calculate cost (example pricing)
input_cost = response.usage.prompt_tokens * 0.000003 # $3 per 1M tokens
output_cost = response.usage.completion_tokens * 0.000015 # $15 per 1M tokens
search_cost = response.usage.server_tool_use.tool_search_requests * 0.0001 # Example

total_cost = input_cost + output_cost + search_cost

print(f"\nCost Breakdown:")
print(f" Input tokens: ${input_cost:.6f}")
print(f" Output tokens: ${output_cost:.6f}")
print(f" Tool searches: ${search_cost:.6f}")
print(f" Total: ${total_cost:.6f}")
```

</TabItem>
<TabItem value="proxy" label="LiteLLM Proxy">

1. Setup config.yaml

```yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
```

2. Start the proxy

```bash
litellm --config /path/to/config.yaml
```

3. Test it!

```bash
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Find and use the weather tool for San Francisco"
}],
"tools": [
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# ... 100 deferred tools
]
}
'
```

Expected Response:

```json
{
...,
"usage": {
...,
"server_tool_use": {
"tool_search_requests": 1
}
}
}
```

</TabItem>
</Tabs>

### Cost Optimization Tips

1. **Keep frequently used tools non-deferred** (3-5 tools)
2. **Use tool search for large catalogs** (10+ tools)
3. **Monitor search requests** to identify optimization opportunities
4. **Combine with effort parameter** for maximum efficiency


---

## Combining Features {#combining-features}

### The Power of Integration

These features work together seamlessly. Here's a real-world example combining all of them:

<Tabs>
<TabItem value="sdk" label="LiteLLM Python SDK">

```python
import litellm
import json

# Large tool catalog with search, programmatic calling, and examples
tools = [
# Enable tool search
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# Enable programmatic calling
{
"type": "code_execution_20250825",
"name": "code_execution"
},
# Database tool with all features
{
"type": "function",
"function": {
"name": "query_database",
"description": "Execute SQL queries against the analytics database. Returns JSON array of results.",
"parameters": {
"type": "object",
"properties": {
"sql": {
"type": "string",
"description": "SQL SELECT statement"
},
"limit": {
"type": "integer",
"description": "Maximum rows to return"
}
},
"required": ["sql"]
}
},
"defer_loading": True, # Tool search
"allowed_callers": ["code_execution_20250825"], # Programmatic calling
"input_examples": [ # Input examples
{
"sql": "SELECT region, SUM(revenue) as total FROM sales GROUP BY region",
"limit": 100
}
]
},
# ... 50 more tools with defer_loading
]

# Make request with effort control
response = litellm.completion(
model="anthropic/claude-opus-4-5-20251101",
messages=[{
"role": "user",
"content": "Analyze sales by region for the last quarter and identify top performers"
}],
tools=tools,
output_config={"effort": "medium"} # Balanced efficiency
)

# Track comprehensive usage
print("Complete Usage Metrics:")
print(f" Input tokens: {response.usage.prompt_tokens}")
print(f" Output tokens: {response.usage.completion_tokens}")
print(f" Total tokens: {response.usage.total_tokens}")

if hasattr(response.usage, 'server_tool_use') and response.usage.server_tool_use:
print(f" Tool searches: {response.usage.server_tool_use.tool_search_requests}")

print(f"\nResponse: {response.choices[0].message.content}")
```

</TabItem>
<TabItem value="proxy" label="LiteLLM Proxy">

1. Setup config.yaml

```yaml
model_list:
- model_name: claude-4
litellm_params:
model: anthropic/claude-opus-4-5-20251101
api_key: os.environ/ANTHROPIC_API_KEY
```

2. Start the proxy

```bash
litellm --config /path/to/config.yaml
```

3. Test it!

```bash
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data ' {
"model": "claude-4",
"messages": [{
"role": "user",
"content": "Analyze sales by region for the last quarter and identify top performers"
}],
"tools": [
{
"type": "tool_search_tool_regex_20251119",
"name": "tool_search_tool_regex"
},
# ... 100 deferred tools
],
"output_config": {
"effort": "medium"
}
"reasoning_effort": "high"
}
'
```

Expected Response:

```json
{
...,
"usage": {
...,
"server_tool_use": {
"tool_search_requests": 1
}
}
}
```

</TabItem>
</Tabs>

### Real-World Benefits

This combination enables:

1. **Massive scale** - Handle 1000+ tools efficiently
2. **Low latency** - Programmatic calling reduces round trips
3. **High accuracy** - Input examples ensure correct tool usage
4. **Cost control** - Effort parameter optimizes token spend
5. **Full visibility** - Track all usage metrics

4 changes: 3 additions & 1 deletion docs/my-website/docs/providers/anthropic.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,16 @@ Check this in code, [here](../completion/input.md#translated-openai-params)
"extra_headers",
"parallel_tool_calls",
"response_format",
"user"
"user",
"reasoning_effort",
```

:::info

**Notes:**
- Anthropic API fails requests when `max_tokens` are not passed. Due to this litellm passes `max_tokens=4096` when no `max_tokens` are passed.
- `response_format` is fully supported for Claude Sonnet 4.5 and Opus 4.1 models (see [Structured Outputs](#structured-outputs) section)
- `reasoning_effort` is automatically mapped to `output_config={"effort": ...}` for Claude Opus 4.5 models (see [Effort Parameter](./anthropic_effort.md))

:::

Expand Down
Loading
Loading