Skip to content

Config Parameters

Yiwei Mao edited this page May 27, 2025 · 5 revisions

Parameters

parameter type default description
projects string[] [] If empty, all projects will pass. Value is like "@web-bench/calculator".
agentMode "local" | "http" "local"
agentEndPoint string "" When agentMode is set to "http", set http API for network requests.
models string[] [] 'models' field in apps/eval/src/model.json
maxdop number 30 max degree of parallelism
logLevel "info" | "warn" | "debug" | "error" "info"
httpLimit number 10 When agentMode is set to "http", maximum concurrent requests
fileDiffLog boolean false Whether to log the diff of files generated by llm. Only enable in 'debug' log level. Note: This affects performance, don't enable it during all-project evaluation.
screenshotLog boolean false Whether to log the screenshot. Only enable in 'debug' log level. Note: This affects performance, don't enable it during all-project evaluation.
startTask string the first task of tasks.jsonl Task executed starts from, including startTask.
endTask string last task of tasks.jsonl Task executed ends to, including endTask.

Q & A

Difference between AgentMode "local" and "http"

  • 'local': This mode has the basic capability to interact with LLM. It can specify the corresponding model in the apps/eval/src/model.json.
  • 'http': Through this mode, it calls the configured agentEndPoint to send a request to the custom Agent.

Add new model for evaluation

  1. For models deployed on OpenRouter, use the native OpenRouter provider with the following configuration:
{
  "title": "anthropic/claude-3-opus",
  "provider": "openrouter",
  "model": "anthropic/claude-3-opus",
  "apiBase": "https://openrouter.ai/api/v1",
  "apiKey": "{{OPENROUTER_API_KEY}}"
}
  1. If existing providers do not meet your requirements, you can evaluate specific models by creating a new Provider. This is achieved by extending the BaseLLM:

    1. export abstract class BaseLLM {
        abstract provider: string
        abstract option: LLMOption
        info: Model
        abstract chat(
          compiledMessages: ChatMessage[],
          originOptions: CompletionOptions
        ): Promise<{
          request: string
          error?: string
          response: string
        }>
      }
    2. option – define parameters for LLM requests:

    3. export interface LLMOption {
        contextLength: number
        maxTokens: number
        temperature?: number
        apiBase: string
      }
    4. info – model metadata in apps/eval/src/model.json.

    5. chat – custom request method that returns the generated text from the LLM.

Clone this wiki locally