Evaluating AI Tool Performance in Robot Framework Test Generation

This repository compares Robot Framework test suites generated by different AI tools (GitHub Copilot, Claude Code, GitLab Duo, Amazon Q). Place each tool’s outputs here, then use the provided comparison prompt and template to generate a single, evidence-based comparison report.

What You Put Here (Per Tool)

Create one folder per tool (named by the AI tool), for example Tools/GitHub Copilot/ or Tools/AmazonQ/. Inside that folder include:

chat/ — the chat transcript(s) with the assistant
robot_tests/ — Robot Framework suites and resources produced by the tool
robot_results/ — execution artifacts and Robocop reports (latest timestamp preferred)
Tool-specific RF standards files for adherence scoring: one of .github/, .claude/, .amazonq/rules, .gitlab/duo

Example:

Tools/
   GitHub Copilot/
      chat/
      robot_tests/
      robot_results/
      .github/
   AmazonQ/
      chat/
      robot_tests/
      robot_results/
      .amazonq/rules

Files In This Repo

AI tools comparison-TEMPLATE.md: The scoring matrix the assistant will fill.
Documentation reference (validation): RF-docs-MCP-server/rf_docs_server.py and RF-docs-MCP-server/generate_library_docs.sh.

How To Run The Comparison

Prepare folders: Add one folder per tool under Tools/ with chat/, robot_tests/, robot_results/, and the tool’s standards files.
Create your comparison round file:

cp "AI tools comparison-TEMPLATE.md" "AI tools comparison - Round x - Model y.md"

Initialize RF docs and services (MCP-ready):

Run the fast start to build the container, generate library docs, and prepare MCP config for your IDE. The MCP server is stdio-based and is spawned by clients when needed.

./fast-start.sh

After it completes, reload VS Code so it picks up the MCP configuration (Command Palette → Developer: Reload Window), or restart VS Code.

Start the comparison in your AI assistant:

Get the comparison prompt from the Confluence and provide it to your assitant in this repository context
The assistant analyzes the per-tool folders, latest Robocop results, and chat transcripts, then fills your "Round x" file.

Evidence and scoring:

Cite evidence via file paths and line ranges from each tool’s robot_tests/ and robot_results/.
Use chat transcripts in chat/ for “Prompt Responsiveness & Control”.
Use the latest Robocop report per tool from robot_results/ for the static analysis category.

Documentation Validation (MCP Ground Truth)

AI will use the documentation references in this repo to validate versions, syntax, and keyword usage during scoring:
- Robot Framework: version 7.4.1 (see RF-docs-MCP-server/rf_docs_server.py).
- Browser library: version 19.12.3; RequestsLibrary: version 0.9.7 (see RF-docs-MCP-server/generate_library_docs.sh).

Notes

This repo’s purpose is comparison only. Any steps to run environments, apps, or test generation live elsewhere (in the source tool projects).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
RF-docs-MCP-server		RF-docs-MCP-server
Tools		Tools
AI tools comparison-TEMPLATE.md		AI tools comparison-TEMPLATE.md
README.md		README.md
docker-compose.yml		docker-compose.yml
fast-start.sh		fast-start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating AI Tool Performance in Robot Framework Test Generation

What You Put Here (Per Tool)

Files In This Repo

How To Run The Comparison

Documentation Validation (MCP Ground Truth)

Notes

About

Uh oh!

Releases

Packages

Languages

eficode/ai-tools-eval-comparison

Folders and files

Latest commit

History

Repository files navigation

Evaluating AI Tool Performance in Robot Framework Test Generation

What You Put Here (Per Tool)

Files In This Repo

How To Run The Comparison

Documentation Validation (MCP Ground Truth)

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages