Skip to content

holgerimbery/MaaJforMCS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

version

Multi-Agent Assessment & Judgement Logo

Multi-Agent Assessment & Judgement

An enterprise-grade, multi-agent aware .NET 9 application for automated testing of Microsoft Copilot Studio agents. Test multiple agents simultaneously across environments, evaluate responses with Azure AI Foundry models, generate test cases from documents, and gain comprehensive quality metrics

Full documentation is available in the Wiki.


Multi-Agent Assessment & Judgement is not intended to replace the Microsoft Copilot Studio Kit or Copilot Studio Evaluation. Instead, it serves as a flexible, open starting point for automated, enterprise‑grade verification and validation of Copilot Studio agents. Its purpose is to give users full control over every aspect of the testing process — both through a transparent GUI and through fully accessible source code—so they can adapt, extend, and integrate the testing workflow to meet their specific architectural, compliance, and automation requirements


Key Features

  • Multi-Agent Testing — run the same test suite against dev, staging, and production agents in parallel
  • Environment & Agent Discovery — browse all Power Platform environments and import Copilot Studio agents automatically via Azure CLI or service principal
  • Direct Line Integration — WebSocket or polling transport with full conversation lifecycle management
  • Model-as-a-Judge Evaluation — Azure AI Foundry LLM scores responses on 5 dimensions (task success, intent match, factuality, helpfulness, safety)
  • Configurable Judge & Question-Generation Prompts — edit the built-in system prompts directly in the UI (Settings and Agents pages) with per-agent override support; no code changes needed
  • Document-Driven Test Generation — upload PDFs, text files, or paste a public HTTP/HTTPS URL; AI generates test cases automatically from any imported content
  • Setup Wizard — guided first-run agent and test suite creation
  • CLI for CI/CDrun, list, agents, report, and generate commands; exit codes, JSON/CSV output, dry-run support; generate creates test cases from a document file directly into a suite without opening the UI
  • Microsoft Entra ID Authentication — optional enterprise SSO with Admin / Tester / Viewer RBAC roles
  • Backup & Restore — download a full database snapshot and restore from the Settings page (Admin only)
  • OpenAPI & Interactive API Browser — OpenAPI manifest at /openapi/v1.json and Scalar UI at /scalar/v1; importable as a Power Automate custom connector; REST API key authentication for CI/CD pipelines
  • Regression Detection — each run report compares results against the previous completed run for the same suite; regressed test cases are highlighted with a side-by-side judge rationale comparison (why it passed before vs. why it failed now)
  • Pass Rate by Category — run reports break down results by TestCase.Category with colour-coded pass-rate bars per topic group
  • Home Screen Dashboard — live running-tests indicator, overall pass rate KPI, recent runs feed, agent summary cards with pass-rate trend arrows, top failing test cases ranked by failure rate, Quick Run shortcut to fire a suite directly from home, system status badges (Database / DirectLine / AI Judge), pass rate trend sparkline, and version / changelog badge; guided empty-state onboarding for new installations
  • Run History Pruning — configurable retention policy in Settings → Data Management to keep the SQLite database from growing unbounded; enter a number of days and prune completed runs older than the threshold in one click
  • Lightweight Rubric Refinement — after a completed run, an amber "Refine Rubric" button appears whenever human verdict overrides disagree with the AI judge; one click sends all disagreement cases to the judge LLM and returns a proposed rubric update you can apply in Settings → AI Judge
  • Latency Trend Chart on Dashboard — SVG sparkline of median latency over the last 10 runs with the latest P95 value and an improving / worsening / stable trend indicator
  • Confidence Scoring Trends on Run Report — each expanded result row shows colour-coded score dots (green / amber / red) for the last 6 runs of that test case with a first → latest delta indicator
  • Local-First & Container-Ready — runs entirely on-premise or in a container via Docker Compose; only calls Direct Line and an AI Foundry endpoint

Screenshots

Home page Setup wizard Test suites Create test suite Upload documents Agent creation Agent Discovery Test run dashboard Test run result Help Settings


Quick Start (no build required)

The fastest way to get running is to download the quickstart package from the latest GitHub Release — no .NET SDK or source code needed, just Docker. The container image is pulled automatically from Docker Hub (holgerimbery/maaj).

  1. Download maaj-quickstart-{version}.zip from the latest release and unzip it
  2. Copy the env template:
    # Windows
    copy .env.template .env
    # Mac / Linux
    cp .env.template .env
    
  3. Edit .env — fill in your Azure OpenAI endpoint, API key, and model name
  4. Start:
    docker compose up -d
  5. Open http://localhost:5062 — the Setup Wizard guides you through the rest

Data is stored in a named Docker volume (maaj-data) and persists across restarts. Use Settings → Data Management to download a backup at any time.

Authentication is disabled by default in the quickstart package — all users get Admin access. Set AUTHENTICATION_ENABLED=true and fill in the AZURE_* values if the app will be internet-accessible. See Entra ID Setup.


Quick Start (from source)

# Clone and build
git clone <repository-url>
cd MaaJforMCS
dotnet build

# Start the Web UI
cd CopilotStudioTestRunner.WebUI
dotnet run
# Open http://localhost:5062 — the Setup Wizard launches automatically

The wizard guides you through creating your first agent, uploading documents, generating test cases, and running your first suite.

For a step-by-step walkthrough see Quick Start.


CLI Usage

The CLI is built alongside the WebUI and shares the same SQLite database.

From a published binary (after dotnet publish or as a .NET global tool):

testrunner run      --suite "Regression Tests"
testrunner list
testrunner agents
testrunner report   --run <run-id> --format csv --output ./reports
testrunner generate --document ./docs/manual.txt --suite "Regression Tests" --count 20

From source (no build/publish step required):

# Run a test suite against all associated agents (exit code 0 = all passed, 1 = failures)
dotnet run --project CopilotStudioTestRunner.CLI -- run --suite "Regression Tests"

# Dry run — preview test cases without executing
dotnet run --project CopilotStudioTestRunner.CLI -- run --suite "Regression Tests" --dry-run

# List all test suites
dotnet run --project CopilotStudioTestRunner.CLI -- list

# List all configured agents
dotnet run --project CopilotStudioTestRunner.CLI -- agents

# Export a completed run as JSON (default) or CSV
dotnet run --project CopilotStudioTestRunner.CLI -- report --run <run-id>
dotnet run --project CopilotStudioTestRunner.CLI -- report --run <run-id> --format csv --output ./reports

# Generate test cases from a document (preview only)
dotnet run --project CopilotStudioTestRunner.CLI -- generate --document ./docs/manual.txt --count 10

# Generate and save directly into a suite
dotnet run --project CopilotStudioTestRunner.CLI -- generate --document ./docs/manual.txt --suite "Regression Tests" --count 20

Full command reference in CLI-Reference.


Docker Deployment

Docker support is fully implemented. ✅

# Build and start (all configuration via .env)
docker compose up -d

Create a .env file from the template below — authentication is controlled by a single flag, no override file required:

JUDGE_ENDPOINT=https://your-resource.openai.azure.com/
JUDGE_API_KEY=your-api-key-here
JUDGE_MODEL=gpt-4o

# Set to true and fill in the Azure AD values to enable Entra ID authentication
AUTHENTICATION_ENABLED=false
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=

Full details including Kubernetes deployment in Docker Deployment.


Documentation

Topic Link
Getting Started Getting-Started
Quick Start (5 min) Quick-Start
Setup Wizard Setup-Wizard
Environment & Agent Discovery Environment-Discovery
Multi-Agent Testing Multi-Agent-Testing
Architecture Architecture
Configuration Reference Configuration-Reference
Authentication & RBAC Authentication · Entra ID Setup · RBAC and Roles
Judge Evaluation Judge-Evaluation
Document Processing Document-Processing
Test Suites & Cases Test-Suites-and-Cases
CLI Reference CLI-Reference
API Reference API-Reference
Docker Deployment Docker-Deployment
Backup & Restore Backup-Restore
Troubleshooting Troubleshooting

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

License

MIT 2026 Holger Imbery

About

Copilot Studio Test Runner for automated testing

Topics

Resources

License

Stars

Watchers

Forks

Packages