Multi-Agent Assessment & Judgement

An enterprise-grade, multi-agent aware .NET 9 application for automated testing of Microsoft Copilot Studio agents. Test multiple agents simultaneously across environments, evaluate responses with Azure AI Foundry models, generate test cases from documents, and gain comprehensive quality metrics

Full documentation is available in the Wiki.

Multi-Agent Assessment & Judgement is not intended to replace the Microsoft Copilot Studio Kit or Copilot Studio Evaluation. Instead, it serves as a flexible, open starting point for automated, enterprise‑grade verification and validation of Copilot Studio agents. Its purpose is to give users full control over every aspect of the testing process — both through a transparent GUI and through fully accessible source code—so they can adapt, extend, and integrate the testing workflow to meet their specific architectural, compliance, and automation requirements

Key Features

Multi-Agent Testing — run the same test suite against dev, staging, and production agents in parallel
Environment & Agent Discovery — browse all Power Platform environments and import Copilot Studio agents automatically via Azure CLI or service principal
Direct Line Integration — WebSocket or polling transport with full conversation lifecycle management
Model-as-a-Judge Evaluation — Azure AI Foundry LLM scores responses on 5 dimensions (task success, intent match, factuality, helpfulness, safety)
Configurable Judge & Question-Generation Prompts — edit the built-in system prompts directly in the UI (Settings and Agents pages) with per-agent override support; no code changes needed
Document-Driven Test Generation — upload PDFs, text files, or paste a public HTTP/HTTPS URL; AI generates test cases automatically from any imported content
Setup Wizard — guided first-run agent and test suite creation
CLI for CI/CD — run, list, agents, report, and generate commands; exit codes, JSON/CSV output, dry-run support; generate creates test cases from a document file directly into a suite without opening the UI
Microsoft Entra ID Authentication — optional enterprise SSO with Admin / Tester / Viewer RBAC roles
Backup & Restore — download a full database snapshot and restore from the Settings page (Admin only)
OpenAPI & Interactive API Browser — OpenAPI manifest at /openapi/v1.json and Scalar UI at /scalar/v1; importable as a Power Automate custom connector; REST API key authentication for CI/CD pipelines
Regression Detection — each run report compares results against the previous completed run for the same suite; regressed test cases are highlighted with a side-by-side judge rationale comparison (why it passed before vs. why it failed now)
Pass Rate by Category — run reports break down results by TestCase.Category with colour-coded pass-rate bars per topic group
Home Screen Dashboard — live running-tests indicator, overall pass rate KPI, recent runs feed, agent summary cards with pass-rate trend arrows, top failing test cases ranked by failure rate, Quick Run shortcut to fire a suite directly from home, system status badges (Database / DirectLine / AI Judge), pass rate trend sparkline, and version / changelog badge; guided empty-state onboarding for new installations
Run History Pruning — configurable retention policy in Settings → Data Management to keep the SQLite database from growing unbounded; enter a number of days and prune completed runs older than the threshold in one click
Lightweight Rubric Refinement — after a completed run, an amber "Refine Rubric" button appears whenever human verdict overrides disagree with the AI judge; one click sends all disagreement cases to the judge LLM and returns a proposed rubric update you can apply in Settings → AI Judge
Latency Trend Chart on Dashboard — SVG sparkline of median latency over the last 10 runs with the latest P95 value and an improving / worsening / stable trend indicator
Confidence Scoring Trends on Run Report — each expanded result row shows colour-coded score dots (green / amber / red) for the last 6 runs of that test case with a first → latest delta indicator
Local-First & Container-Ready — runs entirely on-premise or in a container via Docker Compose; only calls Direct Line and an AI Foundry endpoint

Screenshots

Quick Start (no build required)

The fastest way to get running is to download the quickstart package from the latest GitHub Release — no .NET SDK or source code needed, just Docker. The container image is pulled automatically from Docker Hub (holgerimbery/maaj).

Download maaj-quickstart-{version}.zip from the latest release and unzip it

Copy the env template:

# Windows
copy .env.template .env
# Mac / Linux
cp .env.template .env

Edit .env — fill in your Azure OpenAI endpoint, API key, and model name
Start:
```
docker compose up -d
```
Open http://localhost:5062 — the Setup Wizard guides you through the rest

Data is stored in a named Docker volume (maaj-data) and persists across restarts. Use Settings → Data Management to download a backup at any time.

Authentication is disabled by default in the quickstart package — all users get Admin access. Set AUTHENTICATION_ENABLED=true and fill in the AZURE_* values if the app will be internet-accessible. See Entra ID Setup.

Quick Start (from source)

# Clone and build
git clone <repository-url>
cd MaaJforMCS
dotnet build

# Start the Web UI
cd CopilotStudioTestRunner.WebUI
dotnet run
# Open http://localhost:5062 — the Setup Wizard launches automatically

The wizard guides you through creating your first agent, uploading documents, generating test cases, and running your first suite.

For a step-by-step walkthrough see Quick Start.

CLI Usage

The CLI is built alongside the WebUI and shares the same SQLite database.

From a published binary (after dotnet publish or as a .NET global tool):

testrunner run      --suite "Regression Tests"
testrunner list
testrunner agents
testrunner report   --run <run-id> --format csv --output ./reports
testrunner generate --document ./docs/manual.txt --suite "Regression Tests" --count 20

From source (no build/publish step required):

# Run a test suite against all associated agents (exit code 0 = all passed, 1 = failures)
dotnet run --project CopilotStudioTestRunner.CLI -- run --suite "Regression Tests"

# Dry run — preview test cases without executing
dotnet run --project CopilotStudioTestRunner.CLI -- run --suite "Regression Tests" --dry-run

# List all test suites
dotnet run --project CopilotStudioTestRunner.CLI -- list

# List all configured agents
dotnet run --project CopilotStudioTestRunner.CLI -- agents

# Export a completed run as JSON (default) or CSV
dotnet run --project CopilotStudioTestRunner.CLI -- report --run <run-id>
dotnet run --project CopilotStudioTestRunner.CLI -- report --run <run-id> --format csv --output ./reports

# Generate test cases from a document (preview only)
dotnet run --project CopilotStudioTestRunner.CLI -- generate --document ./docs/manual.txt --count 10

# Generate and save directly into a suite
dotnet run --project CopilotStudioTestRunner.CLI -- generate --document ./docs/manual.txt --suite "Regression Tests" --count 20

Full command reference in CLI-Reference.

Docker Deployment

Docker support is fully implemented. ✅

# Build and start (all configuration via .env)
docker compose up -d

Create a .env file from the template below — authentication is controlled by a single flag, no override file required:

JUDGE_ENDPOINT=https://your-resource.openai.azure.com/
JUDGE_API_KEY=your-api-key-here
JUDGE_MODEL=gpt-4o

# Set to true and fill in the Azure AD values to enable Entra ID authentication
AUTHENTICATION_ENABLED=false
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=

Full details including Kubernetes deployment in Docker Deployment.

Documentation

Topic	Link
Getting Started	Getting-Started
Quick Start (5 min)	Quick-Start
Setup Wizard	Setup-Wizard
Environment & Agent Discovery	Environment-Discovery
Multi-Agent Testing	Multi-Agent-Testing
Architecture	Architecture
Configuration Reference	Configuration-Reference
Authentication & RBAC	Authentication · Entra ID Setup · RBAC and Roles
Judge Evaluation	Judge-Evaluation
Document Processing	Document-Processing
Test Suites & Cases	Test-Suites-and-Cases
CLI Reference	CLI-Reference
API Reference	API-Reference
Docker Deployment	Docker-Deployment
Backup & Restore	Backup-Restore
Troubleshooting	Troubleshooting

Contributing

Fork the repository
Create a feature branch
Submit a pull request

License

MIT 2026 Holger Imbery

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
.github		.github
CopilotStudioTestRunner.CLI		CopilotStudioTestRunner.CLI
CopilotStudioTestRunner.Core		CopilotStudioTestRunner.Core
CopilotStudioTestRunner.Data		CopilotStudioTestRunner.Data
CopilotStudioTestRunner.Domain		CopilotStudioTestRunner.Domain
CopilotStudioTestRunner.Tests		CopilotStudioTestRunner.Tests
CopilotStudioTestRunner.WebUI		CopilotStudioTestRunner.WebUI
assets		assets
docs		docs
quickstart		quickstart
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
.gitmodules		.gitmodules
BACKLOG.md		BACKLOG.md
CHANGELOG.md		CHANGELOG.md
CopilotStudioTestRunner.sln		CopilotStudioTestRunner.sln
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
create-github-issues.ps1		create-github-issues.ps1
debug-container.ps1		debug-container.ps1
docker-compose.auth.yml		docker-compose.auth.yml
docker-compose.yml		docker-compose.yml
localrestart.ps1		localrestart.ps1
test-ai-question-generation.csx		test-ai-question-generation.csx
test-connection.csx		test-connection.csx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Agent Assessment & Judgement

Key Features

Screenshots

Quick Start (no build required)

Quick Start (from source)

CLI Usage

Docker Deployment

Documentation

Contributing

License

About

Uh oh!

Releases 11

Packages

Uh oh!

Languages

License

holgerimbery/MaaJforMCS

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Assessment & Judgement

Key Features

Screenshots

Quick Start (no build required)

Quick Start (from source)

CLI Usage

Docker Deployment

Documentation

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 11

Packages 0

Uh oh!

Languages

Packages