Evaluator

Repository Hierarchy

├─ apps
│ ├─ eval
│ │ ├─ report
│ │ ├─ src
├─ common
├─ libraries
├─ projects
├─ tools
│ ├─ bench-agent
│ ├─ evaluator
│ ├─ http-agent
│ ├─ types
├─ ...others

app/eval: Main entry of rush eval, and report will be generated in the app/eval/report.
common: Repository rush configuration.
libraries: Scripts for projects.
projects: Projects.
tools: Tools used by rush eval,
- bench-agent: Web-Agent.
- evaluator: Evaluator.
- http-agent: HTTP agent.
- types: Types.

Eval Config

configuration of rush eval see Config Parameters.

Eval Workflow

rush eval workflow see Evaluator-Workflow.

Eval Env

Some environment variables will be injected during the evaluation runtime.

EVAL_PROJECT_ROOT：Task test workspace.
EVAL_PROJECT_PORT：Task test port.
EVAL: rush eval Flag.

Eval Outputs

Report

The report will be output in app/eval/report and report hierarchy like

├─ report           
│ ├─ eval-202411012-194041       
│ │ ├─ eval.report.md
│ │ ├─ proj1
│ │ │ ├─ proj1.report.md
│ │ │ ├─ proj1-model1-202411012-194041
│ │ │ │ ├─ proj1-model1.report.md
│ │ │ │ ├─ dev.log
│ │ │ │ ├─ ...others
│ │ │ ├─ proj1-model2-202411012-194041
│ │ │ │ ├─ proj1-model1.report.md
│ │ │ │ ├─ dev.log
│ │ │ │ ├─ ...others
│ │ ├─ proj2

Codes

The codes will be output in projects/xxxx/eval and source hierarchy like

├─ eval
│ ├─ eval-202411012-194041       
│ │ ├─ model-name-1
│ │ │ ├─ init-1  // taskid-times
│ │ │ ├─ task-1-1
│ │ │ ├─ task-2-1
│ │ │ ├─ task-2-2
│ │ │ ├─ ...others
│ │ ├─ model-name-2
│ │ │ ├─ init-1
│ │ │ ├─ task-1-1
│ │ │ ├─ ...others

Evaluation | arXiv Paper | Leaderboard

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluator

Evaluator

Repository Hierarchy

Eval Config

Eval Workflow

Eval Env

Eval Outputs

Report

Codes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally