Skip to content

eficode/ai-tools-eval-comparison

Repository files navigation

Evaluating AI Tool Performance in Robot Framework Test Generation

This repository compares Robot Framework test suites generated by different AI tools (GitHub Copilot, Claude Code, GitLab Duo, Amazon Q). Place each tool’s outputs here, then use the provided comparison prompt and template to generate a single, evidence-based comparison report.

What You Put Here (Per Tool)

Create one folder per tool (named by the AI tool), for example Tools/GitHub Copilot/ or Tools/AmazonQ/. Inside that folder include:

  • chat/ — the chat transcript(s) with the assistant
  • robot_tests/ — Robot Framework suites and resources produced by the tool
  • robot_results/ — execution artifacts and Robocop reports (latest timestamp preferred)
  • Tool-specific RF standards files for adherence scoring: one of .github/, .claude/, .amazonq/rules, .gitlab/duo

Example:

Tools/
   GitHub Copilot/
      chat/
      robot_tests/
      robot_results/
      .github/
   AmazonQ/
      chat/
      robot_tests/
      robot_results/
      .amazonq/rules

Files In This Repo

How To Run The Comparison

  1. Prepare folders: Add one folder per tool under Tools/ with chat/, robot_tests/, robot_results/, and the tool’s standards files.
  2. Create your comparison round file:
cp "AI tools comparison-TEMPLATE.md" "AI tools comparison - Round x - Model y.md"
  1. Initialize RF docs and services (MCP-ready):

Run the fast start to build the container, generate library docs, and prepare MCP config for your IDE. The MCP server is stdio-based and is spawned by clients when needed.

./fast-start.sh

After it completes, reload VS Code so it picks up the MCP configuration (Command Palette → Developer: Reload Window), or restart VS Code.

  1. Start the comparison in your AI assistant:
  • Get the comparison prompt from the Confluence and provide it to your assitant in this repository context
  • The assistant analyzes the per-tool folders, latest Robocop results, and chat transcripts, then fills your "Round x" file.
  1. Evidence and scoring:
  • Cite evidence via file paths and line ranges from each tool’s robot_tests/ and robot_results/.
  • Use chat transcripts in chat/ for “Prompt Responsiveness & Control”.
  • Use the latest Robocop report per tool from robot_results/ for the static analysis category.

Documentation Validation (MCP Ground Truth)

Notes

  • This repo’s purpose is comparison only. Any steps to run environments, apps, or test generation live elsewhere (in the source tool projects).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published