open-compass · jgbradley1 · Mar 18, 2026 · Mar 18, 2026 · Mar 23, 2026 · Mar 23, 2026
diff --git a/docs/en/user_guides/models.md b/docs/en/user_guides/models.md
@@ -96,10 +96,74 @@ models = [
 ]
 ```
 
+### Authentication
+
+The `key` parameter defaults to `'ENV'`, which reads from the `OPENAI_API_KEY` environment variable.
+If `OPENAI_API_KEY` is not set, the model will attempt to fallback to
+Azure Managed Identity (`DefaultAzureCredential`) — no extra configuration is needed.
+
+You can also pass a key directly:
+
+```python
+key='sk-...',           # Explicit API key
+key='ENV',              # Read from OPENAI_API_KEY env var (default); falls back to Azure Managed Identity
+```
+
+### Azure OpenAI
+
+To use Azure OpenAI endpoints, set `azure_endpoint` and `azure_api_version` to reference your Azure resource.
+Authentication: if `OPENAI_API_KEY` is set it will be used,
+otherwise Azure Managed Identity is used as a fallback.
+
+```python
+from opencompass.models import OpenAISDK
+
+models = [
+    dict(
+        type=OpenAISDK,
+        path='gpt-4',
+        azure_endpoint='https://{resource-name}.openai.azure.com',
+        azure_api_version='2024-12-01-preview',
+        tokenizer_path='gpt-4',
+        meta_template=dict(round=[
+            dict(role='HUMAN', api_role='HUMAN'),
+            dict(role='BOT', api_role='BOT', generate=True),
+        ]),
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=4096,
+        batch_size=8,
+    ),
+]
+```
+
+### Reasoning Effort
+
+For OpenAI reasoning models (o1, o3, o4, gpt-5), you can control the amount of reasoning
+with the `reasoning_effort` parameter. Valid values are `'low'`, `'medium'`, and `'high'`
+(case-insensitive). Defaults to `None` (use the model's default behavior).
+
+```python
+from opencompass.models import OpenAISDK
+
+models = [
+    dict(
+        type=OpenAISDK,
+        path='o3',
+        reasoning_effort='high',
+        openai_api_base='https://api.openai.com/v1/',
+        max_out_len=4096,
+        max_seq_len=32768,
+    ),
+]
+```
+
 We have provided several examples for API-based models. Please refer to
 
 ```bash
 configs
+├── eval_api_demo.py
+├── eval_api_azure_openai_demo.py
 ├── eval_zhipu.py
 ├── eval_xunfei.py
 └── eval_minimax.py

diff --git a/docs/zh_cn/user_guides/models.md b/docs/zh_cn/user_guides/models.md
@@ -88,10 +88,70 @@ models = [
 ]
 ```
 
+### 认证方式
+
+`key` 参数默认为 `'ENV'`，会从环境变量 `OPENAI_API_KEY` 中读取。如果未设置 `OPENAI_API_KEY`，
+模型会自动回退到 Azure 托管身份（`DefaultAzureCredential`）进行认证，无需额外配置。
+
+你也可以直接传入密钥：
+
+```python
+key='sk-...',           # 直接指定 API Key
+key='ENV',              # 从 OPENAI_API_KEY 环境变量读取（默认）；未设置时自动回退到 Azure 托管身份
+```
+
+### Azure OpenAI
+
+使用 Azure OpenAI 时，将 `openai_api_base` 指向你的 Azure 资源即可。
+认证方式自动处理：如果设置了 `OPENAI_API_KEY` 则使用该密钥，否则自动回退到 Azure 托管身份。
+
+```python
+from opencompass.models import OpenAISDK
+
+models = [
+    dict(
+        type=OpenAISDK,
+        path='gpt-4',
+        azure_endpoint='https://{resource-name}.openai.azure.com',
+        azure_api_version='2024-12-01-preview',
+        tokenizer_path='gpt-4',
+        meta_template=dict(round=[
+            dict(role='HUMAN', api_role='HUMAN'),
+            dict(role='BOT', api_role='BOT', generate=True),
+        ]),
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=4096,
+        batch_size=8,
+    ),
+]
+```
+
+### 推理力度（Reasoning Effort）
+
+对于 OpenAI 推理模型（o1、o3、o4、gpt-5），可以通过 `reasoning_effort` 参数控制推理深度。
+有效值为 `'low'`、`'medium'`、`'high'`（不区分大小写）。默认为 `None`（使用模型的默认行为）。
+
+```python
+from opencompass.models import OpenAISDK
+
+models = [
+    dict(
+        type=OpenAISDK,
+        path='o3',
+        reasoning_effort='high',                 # 控制推理深度
+        openai_api_base='https://api.openai.com/v1/',
+        max_out_len=4096,
+        max_seq_len=32768,
+    ),
+]
+```
+
 我们也提供了API模型的评测示例，请参考
 
 ```bash
 configs
+├── eval_api_azure_openai_demo.py
 ├── eval_zhipu.py
 ├── eval_xunfei.py
 └── eval_minimax.py

diff --git a/examples/eval_api_azure_openai_demo.py b/examples/eval_api_azure_openai_demo.py
@@ -0,0 +1,57 @@
+"""
+Example configuration of using Azure OpenAI models.
+
+If OPENAI_API_KEY is not set, Azure Managed Identity (DefaultAzureCredential)
+is used automatically as a fallback.
+"""
+
+from mmengine.config import read_base
+
+from opencompass.models import OpenAI, OpenAISDK
+
+with read_base():
+    from opencompass.configs.datasets.demo.demo_gsm8k_chat_gen import \
+        gsm8k_datasets
+
+# API template for chat models
+api_meta_template = dict(
+    round=[
+        dict(role='HUMAN', api_role='HUMAN'),
+        dict(role='BOT', api_role='BOT', generate=True),
+    ],
+)
+
+models = [
+    dict(
+        abbr='Azure-GPT-5.1',
+        type=OpenAI,
+        path='gpt-5.1',
+        tokenizer_path='gpt-5',
+        # Azure OpenAI endpoint format:
+        openai_api_base='https://{resource-name}.openai.azure.com/openai/deployments/{deployment-name}/chat/completions?api-version=2024-12-01-preview',
+        meta_template=api_meta_template,
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=4096,
+        batch_size=8,
+        retry=2,
+    ),
+    dict(
+        abbr='Azure-GPT-5.1-SDK',
+        type=OpenAISDK,
+        path='gpt-5.1',
+        tokenizer_path='gpt-5',
+        # Azure OpenAI endpoint format:
+        azure_endpoint='https://{resource-name}.openai.azure.com',
+        azure_api_version='2024-12-01-preview',
+        meta_template=api_meta_template,
+        query_per_second=1,
+        max_out_len=2048,
+        max_seq_len=4096,
+        batch_size=8,
+        retry=2,
+    ),
+]
+
+# Datasets to evaluate
+datasets = gsm8k_datasets
diff --git a/opencompass/models/base_api.py b/opencompass/models/base_api.py
@@ -310,6 +310,9 @@ def parse_template(self, prompt_template: PromptType,
             for item in prompt[1:]:
                 if item['role'] == last_role:
                     new_prompt[-1]['prompt'] += '\n' + item['prompt']
+                    if item.get('image'):
+                        existing = new_prompt[-1].get('image', [])
+                        new_prompt[-1]['image'] = existing + item['image']
                 else:
                     last_role = item['role']
                     new_prompt.append(item)
@@ -452,6 +455,8 @@ def _role2api_role(self,
         res['prompt'] = merged_prompt.get('begin', '')
         res['prompt'] += merged_prompt.get('prompt', '')
         res['prompt'] += merged_prompt.get('end', '')
+        if merged_prompt.get('image'):
+            res['image'] = merged_prompt['image']
         return res, True