Skip to content

Evaluator

guanxinyi edited this page Jun 5, 2025 · 1 revision

Evaluator

Repository Hierarchy

├─ apps
│ ├─ eval
│ │ ├─ report
│ │ ├─ src
├─ common
├─ libraries
├─ projects
├─ tools
│ ├─ bench-agent
│ ├─ evaluator
│ ├─ http-agent
│ ├─ types
├─ ...others
  • app/eval: Main entry of rush eval, and report will be generated in the app/eval/report.
  • common: Repository rush configuration.
  • libraries: Scripts for projects.
  • projects: Projects.
  • tools: Tools used by rush eval,
    • bench-agent: Web-Agent.
    • evaluator: Evaluator.
    • http-agent: HTTP agent.
    • types: Types.

Eval Config

configuration of rush eval see Config Parameters.

Eval Workflow

rush eval workflow see Evaluator-Workflow.

Eval Env

Some environment variables will be injected during the evaluation runtime.

  • EVAL_PROJECT_ROOT:Task test workspace.
  • EVAL_PROJECT_PORT:Task test port.
  • EVAL: rush eval Flag.

Eval Outputs

Report

  • The report will be output in app/eval/report and report hierarchy like
├─ report           
│ ├─ eval-202411012-194041       
│ │ ├─ eval.report.md
│ │ ├─ proj1
│ │ │ ├─ proj1.report.md
│ │ │ ├─ proj1-model1-202411012-194041
│ │ │ │ ├─ proj1-model1.report.md
│ │ │ │ ├─ dev.log
│ │ │ │ ├─ ...others
│ │ │ ├─ proj1-model2-202411012-194041
│ │ │ │ ├─ proj1-model1.report.md
│ │ │ │ ├─ dev.log
│ │ │ │ ├─ ...others
│ │ ├─ proj2

Codes

  • The codes will be output in projects/xxxx/eval and source hierarchy like
├─ eval
│ ├─ eval-202411012-194041       
│ │ ├─ model-name-1
│ │ │ ├─ init-1  // taskid-times
│ │ │ ├─ task-1-1
│ │ │ ├─ task-2-1
│ │ │ ├─ task-2-2
│ │ │ ├─ ...others
│ │ ├─ model-name-2
│ │ │ ├─ init-1
│ │ │ ├─ task-1-1
│ │ │ ├─ ...others

Clone this wiki locally