Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 0 additions & 41 deletions apps/site/docs/en/model-provider.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,6 @@ Some advanced configs are also supported. Usually you don't need to use them.

| Name | Description |
|------|-------------|
| `OPENAI_USE_AZURE` | Optional. Set to "true" to use Azure OpenAI Service. See more details in the following section. |
| `MIDSCENE_OPENAI_INIT_CONFIG_JSON` | Optional. Custom JSON config for OpenAI SDK initialization |
| `MIDSCENE_OPENAI_HTTP_PROXY` | Optional. HTTP/HTTPS proxy configuration (e.g. `http://127.0.0.1:8080` or `https://proxy.example.com:8080`). This option has higher priority than `MIDSCENE_OPENAI_SOCKS_PROXY` |
| `MIDSCENE_OPENAI_SOCKS_PROXY` | Optional. SOCKS proxy configuration (e.g. "socks5://127.0.0.1:1080") |
Expand Down Expand Up @@ -99,34 +98,6 @@ Import the dotenv module in your script. It will automatically read the environm
import 'dotenv/config';
```

## Using Azure OpenAI Service

There are some extra configs when using Azure OpenAI Service.

### Use ADT token provider

This mode cannot be used in Chrome extension.

```bash
# this is always true when using Azure OpenAI Service
export MIDSCENE_USE_AZURE_OPENAI=1

export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

### Use keyless authentication

```bash
export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

## Set config by JavaScript

You can also override the config by javascript. Remember to call this before running Midscene codes.
Expand Down Expand Up @@ -188,18 +159,6 @@ export MIDSCENE_MODEL_NAME="ui-tars-72b-sft"
export MIDSCENE_USE_VLM_UI_TARS=1
```

## Example: config `claude-3-opus-20240229` from Anthropic

When configuring `MIDSCENE_USE_ANTHROPIC_SDK=1`, Midscene will use Anthropic SDK (`@anthropic-ai/sdk`) to call the model.

Configure the environment variables:

```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```

## Example: config request headers (like for openrouter)

```bash
Expand Down
39 changes: 0 additions & 39 deletions apps/site/docs/zh/model-provider.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ Midscene 默认集成了 OpenAI SDK 调用 AI 服务。使用这个 SDK 限定

| 名称 | 描述 |
|------|-------------|
| `OPENAI_USE_AZURE` | 可选。设置为 "true" 以使用 Azure OpenAI Service。更多详情请参阅后文 |
| `MIDSCENE_OPENAI_INIT_CONFIG_JSON` | 可选。OpenAI SDK 的初始化配置 JSON |
| `MIDSCENE_OPENAI_HTTP_PROXY` | 可选。HTTP/HTTPS 代理配置 (如 `http://127.0.0.1:8080` 或 `https://proxy.example.com:8080`)。这个选项优先级高于 `MIDSCENE_OPENAI_SOCKS_PROXY` |
| `MIDSCENE_OPENAI_SOCKS_PROXY` | 可选。SOCKS 代理配置 (如 "socks5://127.0.0.1:1080") |
Expand Down Expand Up @@ -102,32 +101,6 @@ OPENAI_API_KEY="sk-abcdefghijklmnopqrstuvwxyz"
import 'dotenv/config';
```

## 使用 Azure OpenAI 服务时的配置

### 使用 ADT token provider

此种模式无法运行在浏览器插件中。

```bash
# 使用 Azure OpenAI 服务时,配置为 1
export MIDSCENE_USE_AZURE_OPENAI=1

export MIDSCENE_AZURE_OPENAI_SCOPE="https://cognitiveservices.azure.com/.default"
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

### 使用 keyless 模式

```bash
export MIDSCENE_USE_AZURE_OPENAI=1
export AZURE_OPENAI_ENDPOINT="..."
export AZURE_OPENAI_KEY="..."
export AZURE_OPENAI_API_VERSION="2024-05-01-preview"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
```

## 使用 Javascript 配置 AI 服务

你也可以在运行 Midscene 代码之前,使用 Javascript 来配置 AI 服务。
Expand Down Expand Up @@ -186,17 +159,5 @@ export MIDSCENE_MODEL_NAME="ui-tars-72b-sft"
export MIDSCENE_USE_VLM_UI_TARS=1
```

## 示例:使用 Anthropic 的 `claude-3-opus-20240229` 模型

当配置 `MIDSCENE_USE_ANTHROPIC_SDK=1` 时,Midscene 会使用 Anthropic SDK (`@anthropic-ai/sdk`) 来调用模型。

配置环境变量:

```bash
export MIDSCENE_USE_ANTHROPIC_SDK=1
export ANTHROPIC_API_KEY="....."
export MIDSCENE_MODEL_NAME="claude-3-opus-20240229"
```


<TroubleshootingLLMConnectivity />
4 changes: 2 additions & 2 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 3 additions & 5 deletions packages/cli/src/config-factory.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,19 +59,17 @@ async function expandFilePatterns(
basePath: string,
): Promise<string[]> {
const allFiles: string[] = [];
const seenFiles = new Set<string>();

for (const pattern of patterns) {
try {
const yamlFiles = await matchYamlFiles(pattern, {
cwd: basePath,
});

// Add all matched files, including duplicates
// This allows users to execute the same file multiple times
for (const file of yamlFiles) {
if (!seenFiles.has(file)) {
seenFiles.add(file);
allFiles.push(file);
}
allFiles.push(file);
}
} catch (error) {
console.warn(`Warning: Failed to expand pattern "${pattern}":`, error);
Expand Down
47 changes: 27 additions & 20 deletions packages/cli/src/create-yaml-player.ts
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,11 @@ export async function createYamlPlayer(
): Promise<ScriptPlayer<MidsceneYamlScriptEnv>> {
const yamlScript =
script || parseYamlScript(readFileSync(file, 'utf-8'), file);

// Deep clone the script to avoid mutation issues when the same file is executed multiple times
// This ensures each ScriptPlayer instance has its own independent copy of the YAML data
const clonedYamlScript = structuredClone(yamlScript);

const fileName = basename(file, extname(file));
const preference = {
headed: options?.headed,
Expand All @@ -60,25 +65,27 @@ export async function createYamlPlayer(
};

const player = new ScriptPlayer(
yamlScript,
clonedYamlScript,
async () => {
const freeFn: FreeFn[] = [];
const webTarget = yamlScript.web || yamlScript.target;
const webTarget = clonedYamlScript.web || clonedYamlScript.target;

// Validate that only one target type is specified
const targetCount = [
typeof webTarget !== 'undefined',
typeof yamlScript.android !== 'undefined',
typeof yamlScript.ios !== 'undefined',
typeof yamlScript.interface !== 'undefined',
typeof clonedYamlScript.android !== 'undefined',
typeof clonedYamlScript.ios !== 'undefined',
typeof clonedYamlScript.interface !== 'undefined',
].filter(Boolean).length;

if (targetCount > 1) {
const specifiedTargets = [
typeof webTarget !== 'undefined' ? 'web' : null,
typeof yamlScript.android !== 'undefined' ? 'android' : null,
typeof yamlScript.ios !== 'undefined' ? 'ios' : null,
typeof yamlScript.interface !== 'undefined' ? 'interface' : null,
typeof clonedYamlScript.android !== 'undefined' ? 'android' : null,
typeof clonedYamlScript.ios !== 'undefined' ? 'ios' : null,
typeof clonedYamlScript.interface !== 'undefined'
? 'interface'
: null,
].filter(Boolean);

throw new Error(
Expand All @@ -88,7 +95,7 @@ export async function createYamlPlayer(

// handle new web config
if (typeof webTarget !== 'undefined') {
if (typeof yamlScript.target !== 'undefined') {
if (typeof clonedYamlScript.target !== 'undefined') {
console.warn(
'target is deprecated, please use web instead. See https://midscenejs.com/automate-with-scripts-in-yaml for more information. Sorry for the inconvenience.',
);
Expand Down Expand Up @@ -123,7 +130,7 @@ export async function createYamlPlayer(
{
...preference,
cache: processCacheConfig(
yamlScript.agent?.cache,
clonedYamlScript.agent?.cache,
fileName,
fileName,
),
Expand Down Expand Up @@ -156,7 +163,7 @@ export async function createYamlPlayer(
const agent = new AgentOverChromeBridge({
closeNewTabsAfterDisconnect: webTarget.closeNewTabsAfterDisconnect,
cache: processCacheConfig(
yamlScript.agent?.cache,
clonedYamlScript.agent?.cache,
fileName,
fileName,
),
Expand All @@ -183,11 +190,11 @@ export async function createYamlPlayer(
}

// handle android
if (typeof yamlScript.android !== 'undefined') {
const androidTarget = yamlScript.android;
if (typeof clonedYamlScript.android !== 'undefined') {
const androidTarget = clonedYamlScript.android;
const agent = await agentFromAdbDevice(androidTarget?.deviceId, {
cache: processCacheConfig(
yamlScript.agent?.cache,
clonedYamlScript.agent?.cache,
fileName,
fileName,
),
Expand All @@ -206,8 +213,8 @@ export async function createYamlPlayer(
}

// handle iOS
if (typeof yamlScript.ios !== 'undefined') {
const iosTarget = yamlScript.ios;
if (typeof clonedYamlScript.ios !== 'undefined') {
const iosTarget = clonedYamlScript.ios;
const agent = await agentFromWebDriverAgent({
wdaPort: iosTarget?.wdaPort,
wdaHost: iosTarget?.wdaHost,
Expand All @@ -226,8 +233,8 @@ export async function createYamlPlayer(
}

// handle general interface
if (typeof yamlScript.interface !== 'undefined') {
const interfaceTarget = yamlScript.interface;
if (typeof clonedYamlScript.interface !== 'undefined') {
const interfaceTarget = clonedYamlScript.interface;

const moduleSpecifier = interfaceTarget.module;
let finalModuleSpecifier: string;
Expand Down Expand Up @@ -269,9 +276,9 @@ export async function createYamlPlayer(
// create agent from device
debug('creating agent from device', device);
const agent = createAgent(device, {
...yamlScript.agent,
...clonedYamlScript.agent,
cache: processCacheConfig(
yamlScript.agent?.cache,
clonedYamlScript.agent?.cache,
fileName,
fileName,
),
Expand Down
35 changes: 33 additions & 2 deletions packages/cli/tests/unit-test/config-factory.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,31 @@ summary: "yaml-summary.json"
'No YAML files found matching the patterns in "files"',
);
});

test('should preserve duplicate file entries', async () => {
const mockYamlContent = `
files:
- "login.yml"
- "test.yml"
- "login.yml"
`;
const mockParsedYaml = {
files: ['login.yml', 'test.yml', 'login.yml'],
};

vi.mocked(readFileSync).mockReturnValue(mockYamlContent);
vi.mocked(interpolateEnvVars).mockReturnValue(mockYamlContent);
vi.mocked(yamlLoad).mockReturnValue(mockParsedYaml);
vi.mocked(matchYamlFiles)
.mockResolvedValueOnce(['login.yml'])
.mockResolvedValueOnce(['test.yml'])
.mockResolvedValueOnce(['login.yml']);

const result = await parseConfigYaml(mockIndexPath);

expect(result.files).toEqual(['login.yml', 'test.yml', 'login.yml']);
expect(result.files.length).toBe(3);
});
});

describe('createConfig', () => {
Expand Down Expand Up @@ -273,12 +298,17 @@ concurrent: 2
test('should create config with default options and expand patterns', async () => {
const patterns = ['test1.yml', 'test*.yml'];
const expandedFiles = ['test1.yml', 'testA.yml', 'testB.yml'];
vi.mocked(matchYamlFiles).mockResolvedValue(expandedFiles);
// Mock to return different results for each pattern call
vi.mocked(matchYamlFiles)
.mockResolvedValueOnce(['test1.yml'])
.mockResolvedValueOnce(['test1.yml', 'testA.yml', 'testB.yml']);

const result = await createFilesConfig(patterns);

// Note: test1.yml appears twice because it's matched by both patterns
// This is expected behavior - patterns are evaluated independently
expect(result).toEqual({
files: expandedFiles,
files: ['test1.yml', 'test1.yml', 'testA.yml', 'testB.yml'],
concurrent: 1,
continueOnError: false,
shareBrowserContext: false,
Expand All @@ -290,6 +320,7 @@ concurrent: 2
globalConfig: {
web: undefined,
android: undefined,
ios: undefined,
},
});
expect(matchYamlFiles).toHaveBeenCalledWith(patterns[0], {
Expand Down
5 changes: 1 addition & 4 deletions packages/core/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -72,17 +72,14 @@
"test:parse-action": "npm run test:ai -- tests/ai/parse-action.test.ts"
},
"dependencies": {
"@anthropic-ai/sdk": "0.33.1",
"@azure/identity": "4.5.0",
"@langchain/core": "0.3.26",
"@midscene/recorder": "workspace:*",
"@midscene/shared": "workspace:*",
"@ui-tars/action-parser": "1.2.3",
"dotenv": "^16.4.5",
"https-proxy-agent": "7.0.2",
"jsonrepair": "3.12.0",
"langsmith": "0.3.7",
"openai": "4.81.0",
"openai": "6.3.0",
"socks-proxy-agent": "8.0.4",
"zod": "3.24.3",
"semver": "7.5.2",
Expand Down
Loading
Loading