Skip to content

Evaluator Workflow

guanxinyi edited this page Jun 5, 2025 · 1 revision

Evaluator Workflow

evaluator-workflow
  1. Get Initial Codes: optional step, get initial codes from agent.
  2. Iterate Tasks: from task-1 to task-20.
  3. Call Agent: see data format.
  4. Rewrite files: update files or create new files.
  5. Init Env: optional step,init files running environment.
  6. Build Files: optional step, check files errors, such as reference errors.
  7. Test: end-to-end (E2E) test with Playwright.
  8. Retry: with error context from Build or Test
  9. Report.

Agent Data Format

Request

export interface AgentRequest {
  type: "normal" | "init"
  
  task: string
  
  // Code files, key is filePath, value is fileContent
  files?: Record<string, string>
  
  // Error context
  error?: string
}

Response

export interface AgentResponse {
  // Code files, key is filePath, value is fileContent
  files: Record<string, string>
  
  // [filePath:string]: string  Poor Extension
}

Example - Init Task

  • request
{
  "type": "init",
  "task": " generate a calculator in a single HTML file. the first row should be an input element with id 'display'; the next 4 rows should contain buttons with digits from '0' to '9' and operators including '+-*/=.'; the last row should have a 'Clear' button. display 'Error' when catching exception or getting undefined value during calculating. And add the html file filename after code block. The filename should be on the next line as the language specifier in your code block. the filename is \"index.html\""
}
  • response
{
  "files":{
    "index.html": "...file content"
  }
}

Example - Normal Task with Error

  • request
{
  "type": "normal",
  "task": "add button sqrt with text '√' at the right of button clear; click it to calculate result using display content directly",
  "files": {
    "index.html": "...file content...",
  },
  "error": "...error message..."
}
  • response
{
  "files": {
    "index.html": "...file content...",
    "index.css": "...file content...",
    "index.js": "...file content...",
  }
}
Clone this wiki locally