Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 64 additions & 0 deletions docs/en/user_guides/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,10 +96,74 @@ models = [
]
```

### Authentication

The `key` parameter defaults to `'ENV'`, which reads from the `OPENAI_API_KEY` environment variable.
If `OPENAI_API_KEY` is not set, the model will attempt to fallback to
Azure Managed Identity (`DefaultAzureCredential`) — no extra configuration is needed.

You can also pass a key directly:

```python
key='sk-...', # Explicit API key
key='ENV', # Read from OPENAI_API_KEY env var (default); falls back to Azure Managed Identity
```

### Azure OpenAI

To use Azure OpenAI endpoints, set `azure_endpoint` and `azure_api_version` to reference your Azure resource.
Authentication: if `OPENAI_API_KEY` is set it will be used,
otherwise Azure Managed Identity is used as a fallback.

```python
from opencompass.models import OpenAISDK

models = [
dict(
type=OpenAISDK,
path='gpt-4',
azure_endpoint='https://{resource-name}.openai.azure.com',
azure_api_version='2024-12-01-preview',
tokenizer_path='gpt-4',
meta_template=dict(round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
]),
query_per_second=1,
max_out_len=2048,
max_seq_len=4096,
batch_size=8,
),
]
```

### Reasoning Effort

For OpenAI reasoning models (o1, o3, o4, gpt-5), you can control the amount of reasoning
with the `reasoning_effort` parameter. Valid values are `'low'`, `'medium'`, and `'high'`
(case-insensitive). Defaults to `None` (use the model's default behavior).

```python
from opencompass.models import OpenAISDK

models = [
dict(
type=OpenAISDK,
path='o3',
reasoning_effort='high',
openai_api_base='https://api.openai.com/v1/',
max_out_len=4096,
max_seq_len=32768,
),
]
```

We have provided several examples for API-based models. Please refer to

```bash
configs
├── eval_api_demo.py
├── eval_api_azure_openai_demo.py
├── eval_zhipu.py
├── eval_xunfei.py
└── eval_minimax.py
Expand Down
60 changes: 60 additions & 0 deletions docs/zh_cn/user_guides/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,70 @@ models = [
]
```

### 认证方式

`key` 参数默认为 `'ENV'`,会从环境变量 `OPENAI_API_KEY` 中读取。如果未设置 `OPENAI_API_KEY`,
模型会自动回退到 Azure 托管身份(`DefaultAzureCredential`)进行认证,无需额外配置。

你也可以直接传入密钥:

```python
key='sk-...', # 直接指定 API Key
key='ENV', # 从 OPENAI_API_KEY 环境变量读取(默认);未设置时自动回退到 Azure 托管身份
```

### Azure OpenAI

使用 Azure OpenAI 时,将 `openai_api_base` 指向你的 Azure 资源即可。
认证方式自动处理:如果设置了 `OPENAI_API_KEY` 则使用该密钥,否则自动回退到 Azure 托管身份。

```python
from opencompass.models import OpenAISDK

models = [
dict(
type=OpenAISDK,
path='gpt-4',
azure_endpoint='https://{resource-name}.openai.azure.com',
azure_api_version='2024-12-01-preview',
tokenizer_path='gpt-4',
meta_template=dict(round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
]),
query_per_second=1,
max_out_len=2048,
max_seq_len=4096,
batch_size=8,
),
]
```

### 推理力度(Reasoning Effort)

对于 OpenAI 推理模型(o1、o3、o4、gpt-5),可以通过 `reasoning_effort` 参数控制推理深度。
有效值为 `'low'`、`'medium'`、`'high'`(不区分大小写)。默认为 `None`(使用模型的默认行为)。

```python
from opencompass.models import OpenAISDK

models = [
dict(
type=OpenAISDK,
path='o3',
reasoning_effort='high', # 控制推理深度
openai_api_base='https://api.openai.com/v1/',
max_out_len=4096,
max_seq_len=32768,
),
]
```

我们也提供了API模型的评测示例,请参考

```bash
configs
├── eval_api_azure_openai_demo.py
├── eval_zhipu.py
├── eval_xunfei.py
└── eval_minimax.py
Expand Down
57 changes: 57 additions & 0 deletions examples/eval_api_azure_openai_demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
"""
Example configuration of using Azure OpenAI models.

If OPENAI_API_KEY is not set, Azure Managed Identity (DefaultAzureCredential)
is used automatically as a fallback.
"""

from mmengine.config import read_base

from opencompass.models import OpenAI, OpenAISDK

with read_base():
from opencompass.configs.datasets.demo.demo_gsm8k_chat_gen import \
gsm8k_datasets

# API template for chat models
api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
)

models = [
dict(
abbr='Azure-GPT-5.1',
type=OpenAI,
path='gpt-5.1',
tokenizer_path='gpt-5',
# Azure OpenAI endpoint format:
openai_api_base='https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/chat/completions?api-version=2024-12-01-preview',
meta_template=api_meta_template,
query_per_second=1,
max_out_len=2048,
max_seq_len=4096,
batch_size=8,
retry=2,
),
dict(
abbr='Azure-GPT-5.1-SDK',
type=OpenAISDK,
path='gpt-5.1',
tokenizer_path='gpt-5',
# Azure OpenAI endpoint format:
azure_endpoint='https://{resource-name}.openai.azure.com',
azure_api_version='2024-12-01-preview',
meta_template=api_meta_template,
query_per_second=1,
max_out_len=2048,
max_seq_len=4096,
batch_size=8,
retry=2,
),
]

# Datasets to evaluate
datasets = gsm8k_datasets
5 changes: 5 additions & 0 deletions opencompass/models/base_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,6 +310,9 @@ def parse_template(self, prompt_template: PromptType,
for item in prompt[1:]:
if item['role'] == last_role:
new_prompt[-1]['prompt'] += '\n' + item['prompt']
if item.get('image'):
existing = new_prompt[-1].get('image', [])
new_prompt[-1]['image'] = existing + item['image']
else:
last_role = item['role']
new_prompt.append(item)
Expand Down Expand Up @@ -452,6 +455,8 @@ def _role2api_role(self,
res['prompt'] = merged_prompt.get('begin', '')
res['prompt'] += merged_prompt.get('prompt', '')
res['prompt'] += merged_prompt.get('end', '')
if merged_prompt.get('image'):
res['image'] = merged_prompt['image']
return res, True


Expand Down
Loading