Skip to content
Holger Imbery edited this page Feb 23, 2026 · 7 revisions

Multi-Agent Assessment & Judgement — Wiki

Disclaimer: this wiki pages are still work in progress

An enterprise-grade, multi-agent aware .NET 9 application for automated testing of Microsoft Copilot Studio agents. Test multiple agents simultaneously across environments, compare performance, evaluate responses with AI Foundry models, generate test cases from documents, and get comprehensive metrics.


Quick Links

Topic Description
Getting Started Prerequisites, installation, first run
Quick Start 5-minute setup guide
Setup Wizard Agent-first guided setup
Environment & Agent Discovery Browse Power Platform environments and import agents
Multi-Agent Testing Testing multiple agents in parallel
Architecture System design and project structure
Database Schema All entities and relationships
Configuration Reference All configuration options
Agent Configuration Per-agent settings
Authentication Entra ID setup and security
RBAC and Roles Roles and permission matrix
Judge Evaluation AI scoring dimensions and weights
Judge Prompts Prompt templates and calibration
Document Processing Upload and chunk documents
Question Generation AI-powered test generation
Test Suites and Cases Creating and managing tests
CLI Reference Command-line interface
API Reference REST API endpoints
Deployment Local, Docker, Azure, Kubernetes
Docker Deployment Containerization guide
Troubleshooting Common issues and fixes

Key Capabilities

  • Multi-Agent Testing — configure agents for dev, staging, and production; run the same suite against all simultaneously
  • Direct Line Integration — WebSocket or polling transport with full conversation lifecycle management
  • Model-as-a-Judge — Azure AI Foundry LLM evaluates responses on 5 dimensions (task success, intent match, factuality, helpfulness, safety)
  • Configurable Judge & Question-Generation Prompts — edit system prompts directly in the UI; per-agent overrides supported; no code changes required
  • Document-Driven Test Generation — upload PDFs, text files, or paste a public HTTP/HTTPS URL; AI generates test cases automatically from any imported content
  • Setup Wizard — guided agent-first onboarding flow
  • CLI for CI/CDrun, list, agents, report, and generate commands; exit codes, JSON/CSV output, dry-run support
  • Microsoft Entra ID Authentication — optional enterprise SSO with Admin / Tester / Viewer roles
  • Backup & Restore — download a full database snapshot from Settings; restore with a single upload
  • OpenAPI & Interactive API Browser — OpenAPI manifest at /openapi/v1.json and Scalar UI at /scalar/v1; importable as a Power Automate custom connector; REST API key auth for CI/CD pipelines
  • Regression Detection — each run report compares results against the previous run for the same suite; regressed test cases are highlighted with a side-by-side judge rationale comparison
  • Pass Rate by Category — run reports break down results by TestCase.Category with colour-coded pass-rate bars per topic group
  • Lightweight Rubric Refinement — when human verdict overrides disagree with the AI judge, a "Refine Rubric" button sends all disagreements to the LLM and returns a proposed rubric update
  • Home Screen Insights — system status badges (Database / DirectLine / AI Judge), pass rate trend sparkline, agent summary cards, top failing test cases, Quick Run shortcut, and run history feed
  • Latency & Confidence Trends — sparkline of median latency over last 10 runs on the Dashboard; per-test-case score history dots (green/amber/red) in expanded run report rows
  • Run History Pruning — configurable retention policy in Settings → Data Management keeps the SQLite database from growing unbounded
  • Local-First — runs entirely on-premise; only calls Direct Line and an AI Foundry endpoint

Project Structure

CopilotStudioTestRunner.Domain   — Entities, configuration models
CopilotStudioTestRunner.Data     — EF Core DbContext, SQLite
CopilotStudioTestRunner.Core     — Services (Judge, Execution, DirectLine, Documents)
CopilotStudioTestRunner.WebUI    — Blazor Server UI + REST API
CopilotStudioTestRunner.CLI      — Command-line interface
CopilotStudioTestRunner.Tests    — Unit, Integration, End-to-End tests

Contributing

Contributions are welcome!

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-improvement)
  3. Make your changes and add tests where applicable
  4. Submit a pull request

Please follow existing code style and ensure all tests pass before submitting.


License

MIT 2026 Holger Imbery

Clone this wiki locally