-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Holger Imbery edited this page Feb 23, 2026
·
7 revisions
Disclaimer: this wiki pages are still work in progress
An enterprise-grade, multi-agent aware .NET 9 application for automated testing of Microsoft Copilot Studio agents. Test multiple agents simultaneously across environments, compare performance, evaluate responses with AI Foundry models, generate test cases from documents, and get comprehensive metrics.
| Topic | Description |
|---|---|
| Getting Started | Prerequisites, installation, first run |
| Quick Start | 5-minute setup guide |
| Setup Wizard | Agent-first guided setup |
| Environment & Agent Discovery | Browse Power Platform environments and import agents |
| Multi-Agent Testing | Testing multiple agents in parallel |
| Architecture | System design and project structure |
| Database Schema | All entities and relationships |
| Configuration Reference | All configuration options |
| Agent Configuration | Per-agent settings |
| Authentication | Entra ID setup and security |
| RBAC and Roles | Roles and permission matrix |
| Judge Evaluation | AI scoring dimensions and weights |
| Judge Prompts | Prompt templates and calibration |
| Document Processing | Upload and chunk documents |
| Question Generation | AI-powered test generation |
| Test Suites and Cases | Creating and managing tests |
| CLI Reference | Command-line interface |
| API Reference | REST API endpoints |
| Deployment | Local, Docker, Azure, Kubernetes |
| Docker Deployment | Containerization guide |
| Troubleshooting | Common issues and fixes |
- Multi-Agent Testing — configure agents for dev, staging, and production; run the same suite against all simultaneously
- Direct Line Integration — WebSocket or polling transport with full conversation lifecycle management
- Model-as-a-Judge — Azure AI Foundry LLM evaluates responses on 5 dimensions (task success, intent match, factuality, helpfulness, safety)
- Configurable Judge & Question-Generation Prompts — edit system prompts directly in the UI; per-agent overrides supported; no code changes required
- Document-Driven Test Generation — upload PDFs, text files, or paste a public HTTP/HTTPS URL; AI generates test cases automatically from any imported content
- Setup Wizard — guided agent-first onboarding flow
-
CLI for CI/CD —
run,list,agents,report, andgeneratecommands; exit codes, JSON/CSV output, dry-run support - Microsoft Entra ID Authentication — optional enterprise SSO with Admin / Tester / Viewer roles
- Backup & Restore — download a full database snapshot from Settings; restore with a single upload
-
OpenAPI & Interactive API Browser — OpenAPI manifest at
/openapi/v1.jsonand Scalar UI at/scalar/v1; importable as a Power Automate custom connector; REST API key auth for CI/CD pipelines - Regression Detection — each run report compares results against the previous run for the same suite; regressed test cases are highlighted with a side-by-side judge rationale comparison
-
Pass Rate by Category — run reports break down results by
TestCase.Categorywith colour-coded pass-rate bars per topic group - Lightweight Rubric Refinement — when human verdict overrides disagree with the AI judge, a "Refine Rubric" button sends all disagreements to the LLM and returns a proposed rubric update
- Home Screen Insights — system status badges (Database / DirectLine / AI Judge), pass rate trend sparkline, agent summary cards, top failing test cases, Quick Run shortcut, and run history feed
- Latency & Confidence Trends — sparkline of median latency over last 10 runs on the Dashboard; per-test-case score history dots (green/amber/red) in expanded run report rows
- Run History Pruning — configurable retention policy in Settings → Data Management keeps the SQLite database from growing unbounded
- Local-First — runs entirely on-premise; only calls Direct Line and an AI Foundry endpoint
CopilotStudioTestRunner.Domain — Entities, configuration models
CopilotStudioTestRunner.Data — EF Core DbContext, SQLite
CopilotStudioTestRunner.Core — Services (Judge, Execution, DirectLine, Documents)
CopilotStudioTestRunner.WebUI — Blazor Server UI + REST API
CopilotStudioTestRunner.CLI — Command-line interface
CopilotStudioTestRunner.Tests — Unit, Integration, End-to-End tests
Contributions are welcome!
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-improvement) - Make your changes and add tests where applicable
- Submit a pull request
Please follow existing code style and ensure all tests pass before submitting.
MIT 2026 Holger Imbery